Archive for Research

Relief Pitcher Pitch Rankings

To follow the starting pitchers, we have the relief pitcher pitch rankings.

1. Top Ten Four-Seam Fastball (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Craig Kimbrel 94.74 2.34 0.23 1.80 4.15
Sean Doolittle 90.81 1.91 0.22 2.02 3.93
Chad Green 85.35 1.30 0.20 2.57 3.87
Anthony Swarzak 78.77 0.58 0.20 2.37 2.95
Josh Fields 89.12 1.72 0.27 0.89 2.61
Pedro Baez 90.00 1.82 0.28 0.78 2.60
Tommy Kahnle 84.53 1.21 0.25 1.34 2.56
Drew Steckenrider 84.55 1.21 0.26 1.13 2.34
Seung Hwan Oh 80.80 0.80 0.24 1.50 2.30
Josh Hader 87.30 1.52 0.28 0.67 2.19

The Stars: Craig Kimbrel, Sean Doolittle, Pedro Baez

Young and Coming: Chad Green, Drew Steckenrider, Josh Hader

Surprises: Anthony Swarzak, Josh Fields, Tommy Kahnle

No surprise that Kimbrel, probably the most dominant reliever of the past few years, is at the top. Jeff Sullivan discussed Green’s immense success overall and of his fastball recently in his second year for the Yankees. Steckenrider is an unknown rookie for the Marlins, but he has been exceptional for them. Hader is a top prospect for the Brewers and future starter, but his stint in the bullpen has gone perfectly. Swarzak is having a career year, so much so that the Brewers traded for him in an attempt to contend. Kahnle has broken out with the White Sox and Yankees.

2. Top Five Two-Seam Fastball (Min 250):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Craig Stammen 67.73 0.49 0.25 1.95 2.44
Kelvin Herrera 81.71 2.71 0.36 -0.52 2.18
Edwin Diaz 75.76 1.76 0.32 0.42 2.18
Joe Kelly 72.95 1.32 0.30 0.79 2.11
Ryan Madson 68.80 0.66 0.28 1.23 1.89

The Stars: Kelvin Herrera, Ryan Madson

Young and Coming: Edwin Diaz

Surprises: Craig Stammen, Joe Kelly

Herrera has been mostly terrible this year, but his track record says he is still a star. And he clearly hasn’t lost anything from his two-seam fastball. Diaz dominated as a rookie, but has slowed down a lot this season. He’s still 23 — no reason to worry. Stammen didn’t even pitch in the MLB in 2016, but he is performing solidly for the Padres. Kelly is having a career year in Boston behind his high-heat fastball.

3. Top Five Cutter Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Jacob Barnes 104.09 1.99 0.22 1.21 3.20
Dominic Leone 99.80 1.62 0.24 0.81 2.43
Kenley Jansen 90.61 0.84 0.22 1.38 2.21
Alex Colome 85.15 0.37 0.20 1.80 2.17
Tommy Hunter 88.07 0.62 0.22 1.32 1.94

The Stars: Kenley Jansen, Alex Colome

Young and Coming: None

Surprises: Dominic Leone, Jacob Barnes, Tommy Hunter

The most infamous cutter in the game makes the top five, coming from Dodgers closer Jansen. Colome has continued a breakout from 2016 as the Rays closer. Leone had a great rookie season for the Mariners in 2014, but was knocked around in 2015/2016. He has come back nicely in 2017.

4. Top Five Sinker Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Pat Neshek 70.87 1.06 0.25 1.66 2.72
Matt Albers 66.94 0.58 0.24 1.96 2.54
Tony Watson 73.58 1.40 0.28 1.10 2.50
Scott Alexander 76.57 1.77 0.30 0.47 2.24
Richard Bleier 65.97 0.46 0.25 1.68 2.14

The Stars: Pat Neshek

Young and Coming: None

Surprises: Richard Bleier

Neshek, a two-time All-Star, has been spectacular for the Phillies. Bleier, a 30-year-old second-year player, has been unexpectedly good in the majors the past two years.

5. Top Two Splitter Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Blake Parker 101.30 1.30 0.18 1.48 2.78
Chasen Shreve 97.50 0.79 0.18 1.48 2.27

Only nine relievers heavily used the splitter, so this is a small leaderboard. Parker has broken out for the Angels in 2017. Shreve is the third Yankee to appear.

6. Top Five Curveball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
David Robertson 102.86 1.89 0.16 0.73 2.62
Jerry Blevins 95.85 1.28 0.16 0.71 1.99
Ryan Pressly 89.25 0.70 0.12 1.28 1.98
Cody Allen 90.94 0.85 0.15 0.85 1.70
Keone Kela 85.24 0.35 0.13 1.12 1.47

The Stars: David Robertson, Cody Allen

Young and Coming: Keone Kela

Surprises: None

Our fourth Yankee to appear on a leaderboard is Robertson. And none of those four have been Dellin Betances or Aroldis Chapman. Scary. Kela has been one of the only relievers holding the Rangers bullpen afloat.

7. Top Ten Slider (Min 250):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Roberto Osuna 108.02 1.97 0.16 1.52 3.49
Arodys Vizcaino 105.81 1.78 0.16 1.54 3.32
Raisel Iglesias 98.47 1.13 0.14 1.93 3.06
Blake Treinen 105.37 1.74 0.17 1.23 2.97
Pedro Strop 107.08 1.89 0.19 0.97 2.86
Ken Giles 97.17 1.01 0.16 1.57 2.59
James Hoyt 110.74 2.22 0.23 0.19 2.41
Edwin Diaz 99.11 1.18 0.18 1.12 2.31
Adam Morgan 108.19 1.99 0.23 0.16 2.15
Kyle Barraclough 88.13 0.21 0.15 1.67 1.88

The Stars: Roberto Osuna, Pedro Strop, Ken Giles

Young and Coming: Raisel Iglesias, Edwin Diaz

Surprises: James Hoyt

Osuna has been nothing short of excellent for the Blue Jays, manning the closer job for all three of his professional seasons. Still just 22 years old, the best is yet to come. Strop is widely under-appreciated, but he has been a consistent force out of the Cubs bullpen for years. Mariners young stud Edwin Diaz makes his second leaderboard appearance. Hoyt has been terrible for the Astros, so his inclusion is unexpected.

8. Top Three Changeup (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Tommy Kahnle 99.96 1.16 0.18 1.59 2.75
Felipe Rivero 105.68 1.86 0.22 0.47 2.33
Chris Devenski 100.35 1.21 0.20 0.89 2.10

(the changeup is not much of a reliever pitch, so this leaderboard is small)

The Stars: Chris Devenski

Young and Coming: Felipe Rivero

Surprises: None

Kahnle appears again. With much-improved stuff, he has been striking out everybody en route to a big breakout season. Devenksi is only in his second year, but also in his second year of excellence. The unheralded minor-league starter turned long reliever turned dynamic/versatile setup man has been a star in Houston’s bullpen. His changeup is nicknamed the “Circle of Death,” so no surprise seeing him here. Rivero has been dominant for the Pirates in his third year in the bigs.

Top Fifteen Overall:

Pitch Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
4-Seam Craig Kimbrel 94.74 2.34 0.23 1.80 4.15
4-Seam Sean Doolittle 90.81 1.91 0.22 2.02 3.93
4-Seam Chad Green 85.35 1.30 0.20 2.57 3.87
Slider Roberto Osuna 108.02 1.97 0.16 1.52 3.49
Slider Arodys Vizcaino 105.81 1.78 0.16 1.54 3.32
Cutter Jacob Barnes 104.09 1.99 0.22 1.21 3.20
Slider Raisel Iglesias 98.47 1.13 0.14 1.93 3.06
Slider Blake Treinen 105.37 1.74 0.17 1.23 2.97
4-Seam Anthony Swarzak 78.77 0.58 0.20 2.37 2.95
Slider Pedro Strop 107.08 1.89 0.19 0.97 2.86
Splitter Blake Parker 101.30 1.30 0.18 1.48 2.78
Changeup Tommy Kahnle 99.96 1.16 0.18 1.59 2.75
Sinker Pat Neshek 70.87 1.06 0.25 1.66 2.72
Curveball David Robertson 102.86 1.89 0.16 0.73 2.62
4-Seam Josh Fields 89.12 1.72 0.27 0.89 2.61

Best Pitch: Craig Kimbrel, Boston Red Sox, four-Seam

Biggest Surprise: Jacob Barnes, Milwaukee Brewers, Cutter

The leaderboard is run by four-seam fastballs and sliders at the top, which is unsurprising considering those are the favorite pitches of relievers. I’ve said this before, but three Yankees in the top 15. And neither of their alleged best two! That’s absurd. Seeing Kimbrel at the top is the exact opposite. Jacob Barnes, however, is crazy too. The unheralded second-year man hasn’t shown much yet, with a 4.00 FIP in 2017. But that cutter is doing something to hitters.

I will add one more, combining relievers and starters, and with some interesting tidbits.


Starting Pitcher Pitch Rankings

As I stated in my earlier article, I would be posting data from my pitch-effectiveness measurement I introduced. Let’s start with the starting pitchers.

1. Top Ten Four-Seam Fastballs (Min 500):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Chris Sale 85.89 3.08 0.24 2.86 5.94
Jacob deGrom 83.06 2.68 0.27 2.13 4.81
Jose Berrios 74.74 1.51 0.27 1.97 3.48
Jimmy Nelson 76.65 1.78 0.30 1.34 3.12
Jeff Samardzija 75.97 1.68 0.30 1.34 3.02
Max Scherzer 73.97 1.40 0.29 1.55 2.95
Chase Anderson 74.24 1.44 0.29 1.45 2.89
Rick Porcello 77.50 1.90 0.31 0.87 2.77
James Paxton 73.32 1.31 0.29 1.42 2.73
Danny Salazar 80.27 2.29 0.33 0.42 2.71

The Stars: Chris Sale, Jacob deGrom, Max Scherzer, James Paxton

Young and Coming: Jose Berrios

Surprises: Rick Porcello, Chase Anderson, Jeff Samardzija

This group includes some bona-fide talent and some surprises. Porcello’s 1.90 Z-Score on the Sw+Whf% jumps out, considering his lack of stuff and general pitch to contact. Anderson is quietly putting together a solid season, with a 2.88 ERA in 122 innings of work. Samardzija’s incredible strikeout and walk peripherals have been well documented this year.

2. Top Ten Two-Seam Fastballs (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Sonny Gray 72.12 2.18 0.30 1.39 3.57
Jaime Garcia 67.96 1.49 0.28 1.97 3.46
David Price 72.83 2.29 0.32 0.86 3.15
Lance Lynn 66.66 1.27 0.31 1.16 2.43
Matt Garza 65.31 1.05 0.30 1.34 2.39
Luis Castillo 64.66 0.94 0.30 1.44 2.38
Chris Sale 65.23 1.04 0.30 1.34 2.38
Jameson Taillon 69.98 1.82 0.34 0.40 2.23
J.A. Happ 63.82 0.80 0.30 1.29 2.09
Julio Teheran 69.27 1.71 0.35 0.20 1.91

The Stars: Sonny Gray, David Price, Chris Sale, Julio Teheran

Young and Coming: Jameson Taillon, Luis Castillo

Surprises: Jaime Garcia, Matt Garza

We see Sale again, which, considering what he has done this year, is not surprising. Garza has been generally terrible this year, so his inclusion in this list is unexpected. Castillo, a rookie for the Cincinnati Reds, has pieced together some quality starts out of the spotlight.

3. Top Five Cut Fastballs (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
James Paxton 89.03 1.81 0.20 2.03 3.84
Corey Kluber 97.90 2.82 0.28 0.48 3.30
Tyler Chatwood 84.08 1.25 0.21 1.81 3.06
John Lackey 84.72 1.32 0.26 0.85 2.17
Zack Godley 78.94 0.66 0.24 1.39 2.05

(Only five because the small use of cutters)

The Stars: James Paxton, Corey Kluber

Young and Coming: Zach Godley

Surprises: Tyler Chatwood

We see Paxton again, who has established himself as a star this season. Godley has been great for the Arizona Diamondbacks, and Tyler Chatwood has been poor for the Colorado Rockies.

4. Top Five Sinker Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Trevor Williams 68.72 1.87 0.30 1.73 3.61
Jimmy Nelson 65.69 1.43 0.32 1.11 2.54
Jose Quintana 64.77 1.29 0.32 1.18 2.47
Jon Lester 61.89 0.87 0.31 1.29 2.17
Jake Arrieta 58.31 0.35 0.31 1.43 1.78

(Only five because the small use of sinkers)

The Stars: Jake Arrieta, Jon Lester, Jose Quintana

Young and Coming: Trevor Williams

Surprises: None

An emerging starter for the Pittsburgh Pirates, an emerging ace for the Milwaukee Brewers, and…three Chicago Cubs. I gave the Cubs pitchers the benefit of the doubt and put them under “The Stars” category, but they may have pitched their way out of there this season.

5. Top Two Splitter Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Kevin Gausman 94.79 0.96 0.21 1.61 2.57
Ricky Nolasco 95.42 1.02 0.22 1.35 2.37

The splitter leaderboard included only nine starters, so this one is short. Kevin Gausman has rebounded from a horrendous start to be solid, and Ricky Nolasco has continued to provide what he always has: mediocrity.

6. Top Ten Curveball (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Corey Kluber 109.61 3.16 0.12 2.26 5.42
Charlie Morton 88.69 1.30 0.17 1.44 2.74
James Paxton 84.54 0.93 0.16 1.49 2.42
Zack Godley 93.67 1.74 0.22 0.60 2.35
Aaron Nola 87.91 1.23 0.19 1.07 2.30
Carlos Carrasco 88.65 1.30 0.19 0.99 2.28
Ivan Nova 84.32 0.91 0.18 1.21 2.12
James Shields 91.18 1.52 0.22 0.50 2.02
Alex Meyer 82.68 0.76 0.19 1.07 1.84
Jon Lester 89.57 1.38 0.22 0.45 1.82

The Stars: Corey Kluber, James Paxton, Carlos Carrasco

Young and Coming: Zach Godley

Surprises: James Shields, Alex Meyer, John Lester, Charlie Morton

We see Kluber again, and Godley again, and Paxton for a third time. No surprise considering the seasons they have put up. Shields’ days as a front-of-the-rotation starter are far behind him. Meyer has quietly put together some solid starts for the Los Angeles Angels as a complete unknown. Lester is a surprise here because this is his second leaderboard appearance, and he has not pitched well. Morton is mostly known for his injury problems, but he has developed some of the best “stuff” in the game in his first year in Houston.

7. Top Ten Slider (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Carlos Carrasco 108.62 2.51 0.15 2.06 4.56
Max Scherzer 104.66 2.10 0.17 1.79 3.89
Sonny Gray 97.27 1.35 0.16 1.87 3.22
Dylan Bundy 99.46 1.58 0.19 1.28 2.85
Clayton Kershaw 101.38 1.77 0.22 0.82 2.59
Patrick Corbin 94.91 1.11 0.19 1.24 2.35
Marcus Stroman 96.92 1.32 0.21 1.03 2.34
Zack Greinke 104.05 2.04 0.24 0.30 2.34
Mike Clevinger 96.96 1.32 0.21 1.01 2.33
Mike Leake 96.40 1.27 0.21 0.93 2.20

The Stars: Carlos Carrasco, Max Scherzer, Sonny Gray, Clayton Kershaw, Marcus Stroman, Zach Greinke

Young and Coming: Dylan Bundy, Mike Clevinger

Surprises: Patrick Corbin

Finally! The man we have been waiting to see, Kershaw, makes his first appearance. As does Scherzer. The star power of this group is by far the strongest. Bundy has been “Young and Coming” for decades it seems now, and no one knows if the flashes will become consistency ever. Still just 24 years old, though, so I will keep my hopes up. Clevinger has been a nice surprise for the Cleveland Indians, and Corbin has bounced back from a miserable 2016 to be solid for the Arizona Diamondbacks.

8. Top Ten Changeup (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Stephen Strasburg 104.30 2.31 0.15 2.76 5.07
Luis Castillo 97.25 1.46 0.18 2.27 3.73
Danny Salazar 102.60 2.11 0.23 1.01 3.12
Kyle Hendricks 95.35 1.23 0.22 1.25 2.49
Max Scherzer 90.38 0.63 0.20 1.72 2.35
Edinson Volquez 91.28 0.74 0.21 1.54 2.28
Carlos Carrasco 86.47 0.16 0.19 1.90 2.06
Eduardo Rodriguez 95.70 1.28 0.26 0.48 1.76
Jason Vargas 91.99 0.83 0.26 0.46 1.29
Cole Hamels 93.09 0.96 0.27 0.24 1.20

The Stars: Stephen Strasburg, Kyle Hendricks, Max Scherzer, Carlos Carrasco, Cole Hamels

Young and Coming: Luis Castillo, Eduardo Rodriguez

Surprises: Edinson Volquez

Scherzer again, which makes me feel better about the validity of this work. Carrasco for the third time in a row. His breaking and offspeed stuff are killer. Very few people outside of Cincinnati know Castillo, but this is the rookie’s second leaderboard appearance. Rodriguez has continued to flash this year, but injuries and inconsistency continue for the young Red Sock. Volquez is still embracing his mediocrity.

Starters Top Fifteen Overall:

Pitch Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
4-Seam Chris Sale 85.89 3.08 0.24 2.86 5.94
Curveball Corey Kluber 109.61 3.16 0.12 2.26 5.42
Changeup Stephen Strasburg 104.30 2.31 0.15 2.76 5.07
4-Seam Jacob deGrom 83.06 2.68 0.27 2.13 4.81
Slider Carlos Carrasco 108.62 2.51 0.15 2.06 4.56
Slider Max Scherzer 104.66 2.10 0.17 1.79 3.89
Cutter James Paxton 89.03 1.81 0.20 2.03 3.84
Changeup Luis Castillo 97.25 1.46 0.18 2.27 3.73
Sinker Trevor Williams 68.72 1.87 0.30 1.73 3.61
2-Seam Sonny Gray 72.12 2.18 0.30 1.39 3.57
4-Seam Jose Berrios 74.74 1.51 0.27 1.97 3.48
2-Seam Jaime Garcia 67.96 1.49 0.28 1.97 3.46
Cutter Corey Kluber 97.90 2.82 0.28 0.48 3.30
Slider Sonny Gray 97.27 1.35 0.16 1.87 3.22
2-Seam David Price 72.83 2.29 0.32 0.86 3.15

Best Pitch: Chris Sale, Boston Red Sox, 4-Seam Fastball

Best Repertoire: Corey Kluber, Cleveland Indians

Biggest Surprise: Luis Castillo, Cincinnati Reds, Changeup

This list is almost all household names. In first and second, we have the AL Cy Young frontrunners. Jeff Sullivan recently wrote an article about Kluber’s curveball, and how it may be the best pitch in baseball. It isn’t number one here, but second place is not too shabby. His cutter also appears here, so his dominance is not hard to explain. Sonny Gray’s stuff is well known, and he shows up twice on this table, but his numbers are not spectacular this year. Lastly, watch out for Castillo. He’s a no-name rook, but he has been solid for the Reds, and the ranking of his changeup may be the evidence to support his success.

Next up is relievers.


An Alternate Look at Ground Ball “Luckiness”

Earlier this season, Baseball Savant unveiled expected wOBA, which, around these parts of the Internet, has made some real waves. For those unfamiliar, expected wOBA, or xwOBA, predicts a batter’s wOBA from the launch angles and exit velocities of his in-play contact. Because certain speeds and angles are more conducive to hits — for instance, most consider an launch angle to be around 25 degrees — xwOBA is often interpreted as a rough measure of luck. In particular, the difference between a player’s expected and actual wOBA (referred to as xwOBA-wOBA) is often cited in discussions of just how “lucky” that player has been. If a hitter’s xwOBA is significantly higher than his actual wOBA, for example, one can deduce that he’s hit the ball far better than his actual results imply.

A few months ago, Craig Edwards wrote an excellent piece on the new statistic, and discussed the interaction between xwOBA-wOBA and player speed. He noted that most of the “luckiest” batters — those with negative xwOBA-wOBA figures — were generally some of the faster players in the league, and the least lucky batters were among the slowest. Intuitively, this makes sense, as faster players are more likely to beat out infield hits and take extra bases when given the opportunity.

Edwards also charted players’ xwOBA-wOBA against their BsR scores, producing a linear-looking graph (with an R-squared of 0.27) which confirmed at least a moderate link between the two statistics. He noted that because there was no “perfect metric” for player speed at the time, he chose to use BsR as a proxy. While BsR serves this purpose well enough, I do find it problematic that the statistic, by definition, includes runners “taking the extra base,” as this information is also reflected in the wOBA element of xwOBA-wOBA (i.e. when a batter stretches a would-be single into a double, his wOBA is that of a double, while his xwOBA remains at a single). I’d be more comfortable, therefore, comparing xwOBA-wOBA against a more “pure” form of player speed.

It’s fortunate, then, that in the time since Edwards’ piece, Baseball Savant has also released sprint speed, which captures a player’s feet traveled per second on a “maximum effort” play. Using a list of batters with at least 200 at-bats on the season, I’ve re-created the scatterplot used in Edwards’s article, replacing BsR with sprint speed:

all_chart

As it turns out, the results are fairly similar — there is a link, albeit not an incredibly strong one, between a hitter’s speed and his xwOBA-wOBA. The trend is downward-sloping, meaning that faster batters are luckier, but there’s still a lot of scatter around the line of best fit. The highest point on the graph, belonging to Tigers slugger Miguel Cabrera, is particularly far from the trend line, as his 66-point xwOBA-wOBA is far above the expected difference of around zero.

I should also note that the above scatterplot, with an R-squared of 0.16, has a notably weaker correlation coefficient than did Edwards’s chart. The plot did get me wondering, however, how much stronger or weaker the correlation would be for different hit types. Common sense suggests that batter speed, as it relates to xwOBA-wOBA, plays a much more significant role on ground balls than on balls hit in the air. After all, a lazy fly ball to left field will be caught whether hit by Byron Buxton (tied for the fastest batter in the league) or Albert Pujols (the slowest), but Buxton will reach far more on a weak ground ball to the pitcher:

buxton_gif

Again using the all-powerful Baseball Savant search tool, I gathered separate xwOBA-wOBA figures for fly balls, line drives, and grounders. Now, let’s see how the interaction between player speed and xwOBA-wOBA changes based on hit category:

hit_type_chart

There’s virtually no relationship at all for either fly balls or line drives — indeed, neither’s simple linear regression R-squared is significantly above zero — but ground balls are a different story. Not only is the smoothed line for grounders much steeper than for either of the other two hit types, but the R-squared was nearly 0.31. While this is by no means a high correlation coefficient, it does confirm a link between ground ball “luckiness” and player speed.

Because we now know that we should expect faster players to outperform their respective xwOBAs on ground balls (and vice versa), it may also be appropriate to adjust batters’ xwOBA-wOBA figures accordingly. Using the results of the simple linear regression for ground balls, I’ve calculated the difference between each major-league batter’s actual xwOBA-wOBA and his expected xwOBA-wOBA as per the regression. I’ve called the stat “Actual Less Expected xwOBA-wOBA” (It’s a mouthful, I know; let’s just agree to call it ALE xwOBA-wOBA), and while it’s a pretty rough measure, it provides us with a speed-neutral valuation of batters’ ground-ball “luckiness.” A high ALE xwOBA-wOBA indicates misfortune; Brandon Belt, for instance, has an actual xwOBA-wOBA 161 points higher than his sprint speed would suggest. Full lists of batters with the highest and lowest ALE xwOBA-wOBAs are as follows:

ALE_luck2

Finally, I multiplied each batter’s ALE xwOBA-wOBA figure by his ground-ball rate, as per FanGraphs (multiplied by 100 for aesthetic purposes). This should show us which batters have been the most and least lucky in the context of their own respective batted-ball profiles. As shown below, there are a lot of familiar names in these weighted ALE xwOBA-wOBA lists, but there are also a few differences:

ALE_weighted

As mentioned above, an R-squared of 0.31 isn’t big enough to draw any major conclusions. Even so, there’s value in controlling for player speed in any discussion of players outperforming or underperforming their expected wOBAs. By accounting for batters’ sprint speeds, we can get a purer look at which players have actually been the beneficiaries of good luck, and which batters’ negative xwOBA-wOBA on ground balls have resulted from their foot speed. Further, it helps to highlight players who actually have been unlucky; if a player has a ground-ball ALE xwOBA-wOBA close to zero, but a high overall xwOBA-wOBA, they’ve been hitting much higher-quality fly balls and line drives — neither of which are significantly impacted by player speed — than their results indicate. Miguel Cabrera, for instance, falls into that category; while his ground-ball ALE xwOBA-wOBA is relatively close to zero (indicating that he hasn’t benefited from any speed-neutral luck or unluck on grounders) his fly-ball xwOBA-wOBA is a whopping 0.166. So, even though Miggy isn’t one of the faster baserunners in the league, he’s still got a legitimate gripe against Lady Luck — and now, we can see which other batters do, too.


Reverse Engineering Swing Mechanics from Statcast Data

There’s no question that Statcast has revolutionized the way we think about hitting. Now in year three of the Statcast era, everyone from players to stat-heads to the average fan is talking about exit velocities and launch angles. But what can a player do to improve both their exit velocity and launch angle? It all comes down to the mechanics of the swing.

The next great revolution in baseball is leveraging data about swing mechanics to optimize exit velocities and launch angles. It’s a revolution that has already begun. Using technologies developed by companies like Zepp, Blast Motion, and Diamond Kinetics, players and coaches can now get detailed analyses of every swing during practice. Teams are already starting to integrate these swing analyses into their player-development programs. However, none of these sensors are currently being used during MLB games.

It’s only a matter of time before MLB starts tracking swing data during games, but until then we can use Statcast data and a little physics to reverse engineer the mechanics of the swing. A couple of weeks ago, Eno Sarris and Andrew Perpetua wrote some great articles about the importance of making contact out in front of the plate and how we can infer the contact point from Statcast data. Other than contact point, what are the other important characteristics of a swing? Well, let’s look at Eno’s favorite graphic, from the time Zepp analyzed his swing:

It all comes down to swing speed, attack angle, and timing! The time to impact is probably impossible to get from the Statcast data, so let’s focus on the two remaining metrics: swing speed and attack angle.

Swing speed

Statcast doesn’t measure swing speed directly, but nonetheless reports an estimated swing speed, computed using an algorithm with all the transparency of a black box. In fact, it’s so secretive that estimated swing speeds have all but disappeared from Baseball Savant in recent weeks. Just to find the data, I had to dig up a couple of the saved searches from Alex Chamberlain’s article from a few weeks ago on that topic. Here is the leaderboard of the fastest average estimated swing speeds as reported in that article:

Hitter Average Estimated Swing Speed, 2015-17
Player Year AB MPH
Giancarlo Stanton 2015 437 66.5
Aaron Judge 2017 406 66.1
Nelson Cruz 2016 325 65.5
Giancarlo Stanton 2016 192 64.8
Miguel Cabrera 2016 342 64.8
SOURCE: Baseball Savant/Statcast

Eno swings like Giancarlo Stanton!

Now, I don’t want to shatter anyone’s dreams of blasting a home run off of a Major League pitcher, but something is clearly off about the data. It turns out that not all reported bat speeds are equal. Physics tells us that as the bat rotates, the barrel (the end) of the bat moves the fastest and that the bat speed decreases in an approximately linear fashion as we move toward the hands. According to Patrick Cherveny, the lead biomechanist for Blast Motion, which is the official swing sensor of the MLB, measuring the barrel speed is essentially meaningless:

“We see some swing speeds where people claim that you get into the 90s. That would make sense if it’s at the end of the bat, but if you hit it at the end of the bat, it’s not going to travel as far because some of the energy is lost in the bat’s vibration. So that kind of a swing speed is essentially ‘false.’ Swing speed is dependent on where you’re measuring on the bat. In order to maximize quality of contact, the best hitters want to hit the ball in the “sweet spot” of the bat.”

Measuring the speed of the bat at the sweet spot, a two-inch-long area whose center is located six inches from the barrel of the bat, Blast Motion reports that MLB players swing the bat between 65 and 85 MPH. Zepp, on the other hand, reports the barrel speed, which accounts for its elevated values. Still, none of the swing-tracking devices on the market report swing speeds as low as those estimated by Statcast.

Let’s see if we can uncover more information about the black-box algorithm used by Statcast to estimate swing speeds. A quick linear regression between average estimated swing speed and average exit velocity for all batters with at least 100 batted ball events (BBE) in a season from 2015-2017 yields an R2 of 0.99. Wow! Statcast estimates swing speeds almost entirely from exit-velocity data. No wonder the names at the top of the list are so obvious.

Exit velocity, however, isn’t the only velocity measured by Statcast. We also know the speed of the pitch as it is released from the pitcher’s hand. Thinking about the physics, the bat transfers energy and momentum to the oncoming ball at the point where the bat collides with the ball. Thus, any estimation of swing speed based on Statcast’s EV and pitch speed data represents the speed of the bat at the point where it makes contact with the ball. Since hitters want to hit the ball at the sweet spot, swing speeds estimated from Statcast data should fall in approximately the same range as those measured by Blast Motion.

Much of the research on the physics of bat-ball collisions has been conducted by Dr. Alan Nathan, so let’s start with one of his equations:

EV = eAvball + (1 + eA)vbat

where EV is the exit velocity, vball is the velocity of the ball before it hits the bat, and vbat is the velocity of the bat. Here eA is a fudge factor called the collision efficiency, and depends on the COR of the ball, which was at the center of the juiced-ball controversy, the physical properties of the bat, and the point on the bat in which that bat strikes the ball. Thus, assuming all MLB players use a standard ball and bat, eA can be viewed as a measure of quality of contact. Nathan found that at the sweet spot of a wood bat, e= 0.2. Using that value of eA and the release speed and exit velocities from Statcast, we can estimate the bat speed for every ball in play. According to Nathan’s pitch-trajectory calculator, the average pitch slows down by 8.4% from the release point to when it crosses the plate, so we’ll also make that adjustment to the release speed reported by Statcast. Here’s the relationship between our physics-based model for swing speed and the estimated swing speed from Statcast/Baseball Savant:

Look at that! When you get a slope of 1 and an intercept of about 0, you know you’ve hit the nail on the head. This must be the equation that Statcast is using to estimate swing speed. After doing a little digging, it appears that Nathan gave them that exact formula, but assumed that the pitch slows down by 10% by the time it crosses the plate.

The problem with this algorithm is it assumes that the hitter always hits the ball at the sweet spot. Nathan’s paper actually shows that eA varies linearly as a function of EV, from about -0.1 for the weakest hit balls to 0.21 for the best hit, depending on how far from the barrel the bat collides with the ball. To get a good estimate of swing speed, we’ll need to get a better estimate of eA. Unfortunately, eA must be computed independently for every hitter due to inherent differences in a hitter’s strength. For instance, when Giancarlo Stanton hits a ball with an EV of 100 MPH, he is making weaker contact than when Billy Hamilton hits a ball 100 MPH.

I calibrated eA for each hitter with at least 100 BBE in a season by estimating that the average of the top 15 BBE by exit velocity corresponds to eA =0.21 and the average of the bottom 15 BBE by exit velocity corresponds to eA = -0.1 for each player. Since eA and EV are related linearly, we can compute eA from EV for each player. Finally, I will assume that every player uses a standard 34 in., 32 oz. bat. Since Nathan’s study used a 34 in., 31 oz. bat, I subtracted 0.42 MPH from the estimated swing speeds, because every extra ounce reduces that bat speed by about 0.42 MPH. Here’s a look at our new average estimated swing speeds:

We see that swing speed still correlates strongly with exit velocity, but with a much more reasonable R2 value of 0.81. Much of the remaining variance is due to the quality of contact, as estimated by eA. The colors here show the soft-hit rates from FanGraphs. We can see not only that slower swing speeds result in more soft contact, but also that the regression line strongly divides hitters based on their soft-contact rates. Hitters above the line tend to make better contact and hit the ball more efficiently than those below the line, given their swing speeds.

Knowing the value of eA also gives us an estimate of where the ball hit the bat in relation to the barrel. Nathan found that eA ~ d2, where d is the distance from the barrel. Since a quadratic function has no inverse, we’re forced to infer d from our computed values of eA by assuming a linear relationship between the two variables. Once we know where the ball struck the bat, we can also estimate the barrel speed and hand speed, assuming that those speeds are proportional to distance from the axis of rotation.

League Average Estimated Swing Speeds (MPH), 2015-17
Point of Contact Barrel Hands
Year Min Avg Max Min Avg Max Min Avg Max
2015 63.9 71.9 83.3 76.3 85.8 98.9 22.8 26.7 32.2
2016 63.7 72.2 80.8 76.2 86.2 95.5 22.9 26.8 31.0
2017 63.0 71.1 78.6 75.3 84.9 93.8 22.5 26.4 30.7
Overall 63.0 71.7 83.3 75.3 85.7 98.9 22.5 26.6 32.2
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

I have no idea how accurate these estimates are, but they look pretty good! The swing speeds at the point of contact line up nicely with those from Blast Motion (65-85 MPH range and league average of 70 MPH), as do the barrel speeds (Zepp claims 75-95 MPH) and hand speeds (Blast Motion says 23-29 MPH). There’s a lot more uncertainty in the barrel and hand speeds than at the point of contact, because they require additional assumptions about bat size, axis of rotation, and distance from barrel of the point of contact. Even with all of those assumptions, the accuracy probably isn’t much worse than those of the swing-tracking devices on the market today, which claim an uncertainty of about 3-7 MPH for individual swings.

Here are the fastest and slowest average swing speeds in a season during the Statcast era:

Hitter Average Estimated Swing Speeds (MPH), 2015-17
Player Year BBE Point of Impact (MPH) Barrel (MPH) Hands (MPH)
Giancarlo Stanton 2015 187 83.3 98.9 32.2
Rickie Weeks Jr. 2016 127 80.8 95.5 29.5
Giancarlo Stanton 2016 275 80.3 95.5 31.0
Greg Bird 2015 107 80.2 95.2 30.4
Gary Sanchez 2016 145 80.1 95.0 29.9
Kelby Tomlinson 2017 131 63.8 76.3 24.1
Dee Gordon 2017 497 63.8 76.2 23.2
Shawn O’Malley 2016 152 63.7 76.2 23.2
Mallex Smith 2017 178 63.5 75.6 22.6
Billy Hamilton 2017 436 63.0 75.3 22.5
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

At the top of the list we see some well-known sluggers and … Rickie Weeks? Who knew he had such elite bat speed? Unfortunately for him, his average eA in 2016 was the lowest of any player in the Statcast era, indicating that he was making a ton of weak contact. Weeks is the quintessential over-swinger, whose impressive bat speed is often nullified by a lack of bat control. That’s completely unsurprising for a player’s whose 2016 highlight reel features at least one hack that would make even Charlie Brown blush:

 

I was also going to include a table of all of the fastest individual swings, until it turned into an exercise in how many times I can write Giancarlo Stanton’s name. He has 18 of the top 19 swings by barrel speed, which tops out at 108 MPH.

Attack Angle

Unlike swing speed, Statcast doesn’t give us an estimate of attack angle. Instead, we’ll again turn to some research done by Dr. Alan Nathan, this time from his 2017 Saberseminar presentation. To better understand the geometry of the bat-ball collision, let’s look at a diagram from his presentation:

The attack angle, or swing plane, is the angle that the bat is moving at when it hits the ball. Drawing a line between the centers of the bat and ball at the time of impact defines a second angle, called the centerline angle. When a hitter swings the bat such that the attack angle lines up with the centerline angle, he generates his maximum exit velocity and launches the ball at an angle equal to that of the attack angle.

Armed with this information, we can compute the attack angle by looking at the launch angles when a hitter produces his highest exit velocities. Nathan does this by plotting EV against LA for each hitter (below is his figure for Khris Davis’s BBE, whose attack angle is about 20°). He then divides the data, presumably binning the data by launch angle and then pulling out the top few BBE by exit velocity in each bin (red points). Once the data has been divided, a parabola can be fit to the red points, such that the attack angle corresponds to the peak of the parabola.

I found that the computed attack angle is fairly sensitive to the number of bins and number of data points in each bin, so this method is far from perfect. Ultimately, I chose the number of bins based on each player’s standard deviation in launch angle (~3° bins), and selected the top 20% of data points by exit velocity. I then computed a second version of attack angle by averaging the launch angles of the top 15 BBE by exit velocity (just as I did when computing swing speeds). Finally, I averaged the values from the two different methods to get a final value for the attack angle.

This method of computing the attack angle gives us what I’ll call the “preferred” attack angle. Batters change their attack angles slightly based on pitch location, but the preferred attack angle represents the plane of a hitter’s natural swing when he gets a good pitch to hit (à la batting practice).

A lot of digital ink has been spilled over the last few years trying to make sense of how to evaluate hitters using launch angles. While a ton of progress has been made, we still have a long way to go. Who knew launch angles could be so complicated? Here, we see a relatively weak correlation between attack angle and launch angle, because launch angle is also strongly dependent a hitter’s aim, timing, and bat speed. While we don’t have any direct measurements of aim or timing, we can see from the color scale that players with flatter swings (lower attack angles) have more margin for error when it comes to timing, and therefore tend to have higher contact rates than players with uppercut swings (larger attack angles).

League Average Attack and Launch Angles (°), 2015-17
Year Launch Angle Attack Angle
2015 10.5 11.4
2016 11.1 12.0
2017 11.4 13.8
Overall 11.0 12.4
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

The fly-ball revolution is even more evident when looking at league-wide attack angles instead of launch angles. There was a lot of buzz before this season about players reworking their swings to increase their launch angle. Not all of them were successful though, as the average launch angle only increased by 0.3°, despite a nearly 2° jump in attack angle.

Here are the highest and lowest preferred attack angles in a season during the Statcast era:

Hitter Preferred Attack Angle, 2015-17
Player Year BBE Attack Angle(°)
Brian Dozier 2017 433 29.2
Mike Napoli 2017 268 29.0
Ryan Schimpf 2016 351 27.6
Ryan Howard 2016 220 25.7
Chris Davis 2015 265 25.1
Jarrod Dyson 2016 269 -0.1
Jason Bourgeois 2015 164 -0.2
Justin Morneau 2015 143 -1.4
Billy Burns 2016 279 -1.7
Jonathan Herrera 2015 107 -4.5
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

It’s good confirmation to see Ryan Schimpf’s name on this list, though it’s interesting that his attack angle isn’t the extreme outlier that his GB/FB ratio and LA are. An analysis of attack angle may also finally give us an answer to why Brian Doziers’s home runs have gone missing this season. His 2017 batting line is almost identical to that of 2016, except his ISO (and HRs) have plummeted. The biggest difference is his attack angle has skyrocketed from 20° to 29°. We know that the optimal LA for hitting home runs is about 24°, so he’s probably getting too much loft on his fly balls this year. All of these guys at the top of the list would probably benefit by flattening out their swings a bit. Interestingly, Joey Gallo, everyone’s other favorite extreme fly-ball hitter, has an attack angle right at 24° this year. He has built the perfect swing for his batted-ball profile, which explains why he is among the league leaders in HR/FB ratio.

This turned out to be an extremely lengthy primer on swing mechanics, but there plenty of questions that can be tackled with estimates of swing metrics. For instance, can we use swing speed and attack angle to predict future exit velocities and launch angles? How much do hitters reduce their swing speeds on two-strike counts? How do attack angles change with pitch location? But, alas, those questions will have to be answered at a later time.

A complete list of swing speeds and attack angles for players with at least 100 BBE is available here.


A Metric for Home-Plate Umpire Consistency

When calling balls and strikes, consistency matters. As long as an umpire always calls borderline pitches the same way within a game, players seem to accept variations from the rule book strike zone. While there have been many excellent analyses of umpire accuracy, these studies tend to focus on conformity to a fixed zone, rather than on the dependability of those calls.

Disgruntled fans can turn to Brooks Baseball’s strike zone plots when they feel an umpire has had a bad game against their team. For example, the following zone map seems egregiously bad:

Inconsistent Zone

The calls seem very capricious, especially on the outside (right) of the zone. Balls (in green) are found in the same locations as strikes (in red), and some called strikes landed much further outside than pitches that were called balls.

On the other hand, the zone map below appears fairly consistent:

Inconsistent Zone

One might quibble with a couple of the outside calls, but the called strikes, for the most part, are contained within a ring of balls. Notice also that pitches in the lower-inside corner were consistently called balls. While this umpire didn’t establish a perfectly rectangular zone, he did establish a consistent zone; neither pitcher got those calls on the inside corner, and hitters on both teams generally knew what to expect.

In this post, I will propose a metric for assessing the inconsistency of an umpire’s strike zone. This metric does not assess how well the umpire conformed to the rule-book zone or the consensus MLB zone. Rather, it uses some tools from computational geometry to compare the overall shape formed by called strikes with the shape formed by the called balls.

Data from MLB Advanced Media describes each pitch as an ordered pair (px, pz), representing the left/right and up/down positions of the ball as it crosses the front of the plate. This pitch-tracking data includes measurements of each batter’s stance, which can be used to normalize the up/down positions to account for batters of different heights. If we draw a scatterplot of these adjusted positions corresponding to called strikes during a given game, the outline of the points represents what we define as the umpire’s established strike zone.

Convex Hull

More precisely, the established strike zone is what mathematicians call the “convex hull” of these points. If you draw the points on a sheet of paper, the convex hull is what would remain if you trimmed the paper as much as possible, without removing any points, using only straight cuts that go all the way across the sheet.

A similar construction describes the alpha hull of a set of points: replace the paper cutter with a hole punch that can only punch out circular holes of a given radius. Punch out as much of the paper as possible, without removing any of the points, and what remains is the alpha hull. Unlike the convex hull, the alpha hull can have empty region in its interior. We can therefore define an umpire’s established ball zone as the alpha hull of points corresponding to called balls.

Alpha Hull

A consistently-called game should have the property that the established ball zone lies entirely outside of the established strike zone. Any called strikes that fall within the established ball zone (and any balls inside the established strike zone) are inconsistent calls. Since it is reasonable to expect that a consistent umpire will establish different zones depending on the handedness of the batter, we calculate established zones separately for left- and right-handed batters, and then count the number of inconsistent calls from each side of the plate.

Over the course of a game, an umpire’s inconsistency index is the ratio of inconsistent calls to the total number of calls made. For example, the plots below show the established strike and ball zones for the game between the Reds and the Giants on May 12, 2017. Of the 239 calls made that day by the home-plate umpire, 14 balls fell within the established strike zone, while 5 called strikes landed in the established ball zone, resulting in an inconsistency index of (14+5)/239 ≈ 0.0795.

Alpha Hull

How do MLB umpires fare under this metric? Quite well, actually. Using data for the 2017 season (through September 10), the average inconsistency index for all games called was 0.0396. Moreover, of the 2112 games analyzed, there were 183 games where the home-plate umpire scored an inconsistency index of 0.0, meaning that the established strike zone fell completely within the established ball zone. The 15 most consistent umpires, based on their average inconsistency index over all games called in 2017, are given in the table below.

Rank Umpire Inconsistency index
(lower is better)
1.  John Libka  0.0239
2.  Mike DiMuro  0.0253
3.  Nick Mahrley  0.0274
4.  Carlos Torres  0.0275
5.  Chris Segal  0.0275
6.  Chad Fairchild  0.0281
7.  Ben May  0.0281
8.  Travis Eggert  0.0292
9.  Dale Scott  0.0301
10.  Gabe Morales  0.0308
11.  Jim Wolf  0.0310
12.  Sean Barber  0.0310
13.  Eric Cooper  0.0312
14.  Manny Gonzalez  0.0313
15.  Brian Knight  0.0314

While the strike zones of these umpires may not robotically correspond to the rectangles we see on MLB broadcasts, the zones they do establish are remarkably consistent.


Graphs and computations in this article were produced in R, using the PitchRx and alphahull packages. Source code for producing these examples is available on GitHub.


Using Statcast Data to Measure Team Defense

As I’m sure you all know, Statcast allows us to measure the launch angle and velocity for each batted ball. These measurements afford us the ability to estimate precisely the expected wOBA value of every batted ball. Due to the skills of the opposing defense (as well as, admittedly, factors like luck, weather, and ballpark quirks), these estimated wOBA values are often drastically different from their actual values. That is the idea behind Expected Runs Saved (xRS), a metric that I have created to measure team defense. What follows is a discussion of the xRS methodology and some results.

The methodology: The calculation of xRS is actually quite simple. I started by downloading Statcast data from Opening Day through August 29th using Python’s pybaseball module. I then created a dataset consisting of all fair batted balls (excluding home runs) during that time frame. Conveniently, the downloaded data already has the expected wOBA value (based on exit velocity and launch angle), and the actual wOBA value (based on the outcome of the play) for each batted ball. Since we want to penalize teams for making errors, I changed the actual wOBA values for errors from 0 to 0.9 (the value of a single). Then all we have to do is take the average of each metric by team, find the difference, convert that to run values, and we have Expected Runs Saved.

Note that xRS is quite a bit more simplistic than UZR or DRS, as it doesn’t include any of the defensive value derived from keeping baserunners from taking the extra base, preventing steals, turning double plays, etc. While these surely play a role in run prevention, they are less important than converting batted balls into outs, and since I have a full-time job I decided to keep it simple and ignore them.

The results: Let’s start with the most obvious question: which team has the best defense?

It’s the Angels, and it’s not particularly close. While their pitchers have allowed a lot of hard contact (.323 batted-ball xwOBA, 28th in baseball), their actual wOBA on contact is 2nd in baseball at .291, trailing only the Dodgers (.284), who, as Jeff Sullivan recently noted, excel at inducing weak contact.

On the opposite end of the spectrum are the Blue Jays, who have been generally good at generating weak contact (.305 batted-ball xwOBA, 5th in baseball) but terrible at converting those weakly hit balls into outs (.322 batted ball wOBA, 28th in baseball).

In both cases UZR tends to agree, ranking the Angels and Blue Jays 1st and 27th, respectively. Due to (I think) the simplicity of the model, the run values for xRS are quite a bit more extreme than those of either UZR or DRS, but it ranks the teams in generally the same order. At the very least, xRS doesn’t disagree with UZR and DRS much more than the latter two disagree with each other.

Two teams that xRS likes a lot more than UZR and DRS are the Mariners (2nd in xRS, 11th in UZR, 15th in DRS) and Yankees (4th in xRS, 13th in both UZR and DRS). Meanwhile, it dislikes the Dodgers (12th in xRS, 3rd in UZR, 1st in DRS) relative to the other metrics, as well as the Reds (28th in xRS, 5th in UZR, 4th in DRS). Why is this happening? I really don’t know. Could be some defensive components I have left out of xRS, could be ballpark effects, or it could just be that defensive metrics are weird. It remains a mystery. Such is baseball, and such is life.


Predicting the Playoffs

By Dr. Gregory Wood and David Marmor

Among the sabermetric community, the baseball postseason has the reputation of being random. In the past 20 years from 1996-2015, the predicted winner — i.e. the team with the best season record — won the World Series only four times. This raises the question as to what specific skills and performances of a team during a season have a meaningful, if any, correlation with postseason success. This study analyzed data from every playoff team from 1996-2015 to search for significant relationships that could be used to predict postseason wins.

The first method that I used was looking for linear correlations between regular-season statistics and various measures of postseason success. If some statistics were more correlated to playoff success, they could be used to predict a team’s playoff performance.

The most obvious place to start was regular-season wins. As I had expected, there was very little correlation between regular-season wins and postseason wins.

In the graph below, every playoff team’s regular-season wins has been plotted compared to their playoff wins. The data has an extremely low correlation coefficient and is not a good fit with the trend line. The correlation coefficient was 0.007, which is far below the usual significance level of 0.6 or higher. It appears that regular-season record is not a significant factor in post-season success. This explains why postseason success is considered random.

wins vs pwins.png

The goal was to find another statistic that had a significantly stronger correlation to playoff success. I studied many other statistics including runs, runs allowed, ERA, hits and hits allowed, home runs and home runs allowed, walks and walks allowed, strikeouts and strikeouts allowed, slugging percentage, and on-base percentage.

For each one I plotted the correlation chart and found the coefficient of correlation assuming a linear correlation. However the R-squared term was always very small no matter what I tried. This was true even with statistics that are vital to regular-season success, like ERA, OBP, runs and runs allowed.

Untitled1.png

I looked at both the actual totals as well as the totals adjusted for that year’s league average. That way I could account for the fact that the total runs scored has varied quite a bit over the 20 years.

I also tried defining playoff success in three different ways: playoff wins, playoff series won, and playoff winning percentage. However, I got similar results no matter which method I used. None of them had correlations that were significant either way. The statistic that correlated best to playoff wins was run differential, but even it was too weak a correlation to be meaningful.

net runs vs playoff wins.png

The R-squared is still very small, so run differential is not a good predictor of post-season success. This method seems to suggest that the playoffs are in fact random. However, while each statistic individually was not strongly tied to playoff success, maybe combinations of them were.

To find combinations that might be meaningful, I tried using linear modeling. I used a computer program to find the best-fit line between playoff success and the regular-season statistics I was using. The model adjusted the weight given to the different factors to try and find results that were closest to what actually happened by minimizing its chi-squared term. The advantage of this method was that it could combine several factors at once. That way it could determine if there were certain factors that were important in playoff play.

The program was designed to run thousands of simulations at a time to try and improve on its previous best result by minimizing its error compared to the actual results. For each run I selected which statistics would be used. I could give the simulation different starting assumptions and set ranges for how much weight each category could be given. When the initial conditions were changed, the simulation would return different results. However, it was never able to find a result that was statistically significant. The best coefficient of correlation I found was 0.063, far below the level that implies correlation.

It seems that the sabermetric community is correct. Playoff performance is random and not predictable by regular-season performance. Therefore, teams should attempt to build the best regular-season team they can and hope to then get lucky in the playoffs, as opposed to trying to plan specifically for the playoffs.

Appendix

runs vs playoff wins.png

RA vs playoff wins.png

HR vs playoff wins.png

batting average .png

Untitled2.png


dScore: End of August SP Evaluations

I went over the starters version of dScore here, so I’m not going to re-visit that here. I’ll just jump right in with the list!

Top Performing SP by Arsenal, 2017
Rank Name Team dScore +/-
1 Corey Kluber Indians 69.41 +2
2 Max Scherzer Nationals 62.97 -1
3 Chris Sale Red Sox 56.82 -1
4 Clayton Kershaw Dodgers 55.26 +1
5 Noah Syndergaard Mets 47.39 +2
6 Stephen Strasburg Nationals 47.24 +5
7 Danny Salazar Indians 43.46 +16
8 Randall Delgado Diamondbacks 42.00 +1
9 Luis Castillo Reds 37.99 +5
10 Alex Wood Dodgers 40.72 -8
11 Zack Godley Diamondbacks 39.55 -1
12 Luis Severino Yankees 39.24 +1
13 Jacob deGrom Mets 36.69 -1
14 Dallas Keuchel Astros 37.37 -8
15 James Paxton Mariners 35.81 +1
16 Carlos Carrasco Indians 34.23 +4
17 Sonny Gray Yankees 30.59 UR
18 Brad Peacock Astros 29.98 +6
19 Lance McCullers Astros 32.18 -11
20 Buck Farmer Tigers 31.31 UR
21 Nate Karns Royals 30.21 -2
22 Zack Greinke Diamondbacks 29.45 -4
23 Charlie Morton Astros 28.55 UR
24 Kenta Maeda Dodgers 27.40 -7
25 Masahiro Tanaka Yankees 26.83 -3

 

Risers/Fallers

Danny Salazar (+16) – dScore never gave up on him, despite him being absolute trash early on this year. He came back and dominated, launching him up the ranks even farther in the process. Current status: injured. Again.

Sonny Gray (newly ranked) – If there were any doubts about the Gray the Yankees dealt for, he’s actually surpassed his dScore from his fantastic 2015 season. He’s legit (again).

Alex Wood (-8) – Looks like the shoulder issues took a bit of a toll on his stuff, but dScore certainly isn’t out on him.

Dallas Keuchel (-8) – Keuchel’s stuff isn’t the issue. He’s still a buy for me.

Lance McCullers (-11) – Poor Astros. Maybe not too poor though; their aces have gotten hammered but haven’t fallen far at all. McCullers is going to bounce back.

 

The Studs

Some light flip-flopping at the top, with Kluber taking over at #1 from Scherzer. The Klubot’s been SO unconscious. Everyone else is pretty much the usual suspects.

 

The Young Breakouts (re-visited)

Zack Godley (11) – He’s keeping on keeping on. He barely moved since last month’s update, and I’m all-in on him being a stud going forward.

Luis Castillo (9) – He’s certainly done nothing to minimize the hype. In fact, he’s added a purely disgusting sinker to his arsenal and it’s raising the value of everything he throws. Also, from a quick glance at the Pitchf/x leaderboards, two things stand out to me. He seems to have two pitches that line up pretty closely to two top-end pitches: his four-seamer has a near clone in Luis Severino’s, and his changeup is incredibly similar to Danny Salazar’s. That’s a nasty combo.

James Paxton (15) 

 

The Test Case

Buck Farmer (20) – Okay, so to be honest when he showed up on this list, I absolutely thought it was a total whiff. By ERA he’s been a waste, but he’s really living on truly elite in-zone contact management, swinging strikes, K/BB, and hard-hit minimization. His pitch profile is middling (not bad, but not great either), so I really don’t think he’s going to stay this high much longer. He’s certainly doing enough to earn this spot right now, and I’d expect him to not run a 6+ ERA for much longer.

 

The Loaded Teams

Yankees – Luis Severino (12), Sonny Gray (17), Masahiro Tanaka (25) / Some teams have guys higher up, but the Yankees are loaded up and down.

Astros – Dallas Keuchel (14), Lance McCullers (19), Brad Peacock (18), Charlie Morton (23) / Similar to the Yankees. Morton and Peacock are having simply phenomenal years.

 

The Dropouts

Rich Hill (39)

Trevor Cahill (35)

Marcus Stroman (28)

Poor Rich Hill. Lost his perfect game, then lost the game, then lost his spot in the top 25. Cahill’s regressed to #DumpsterFireTrevor since his trade to the Royals. Stroman really didn’t fall that far…and his slider is still a work of art.

 

The Just Missed

Jordan Montgomery (26) – Too bad the Yankees couldn’t send down Sabathia instead. This kid is good.

Aaron Nola (27) – #Ace

Carlos Martinez (29) – Martinez simply teases ace upside, but frankly I think you can pretty much lump him and Chris Archer (30) in the same group — high strikeouts, too many baserunners and sub-ace starts to move into the top tier.

Dinelson Lamet (32) – He’s absolutely got the stuff. He could stand to work on his batted-ball control though.

Jimmy Nelson (34) – dScore buys his changes. He finished at #148 last year. I’ll call him a #2/3 going forward.

 

Notes from Farther Down

Jose Berrios is all the way down to 47. His last month cost him 19 spots, but frankly it could be much worse: Sean Manaea lost 39 spots, down to 87. Manaea really looks lost out there. I don’t want to point at the shoulder injury he had earlier this year since his performance really didn’t drop off after that…but I’m wondering if he’s suffering from some fatigue that’s not helped by that. He’s pretty much stopped throwing his toxic backfoot slider to righties, and that’s cost him his strikeouts. Michael Wacha is another Gray-like Phoenix: he’s up to 52 on the list, once again outperforming his 2015 year. I’m cautiously buying him as a #3 with upside. And finally, buzz round: Mike Clevinger (33)Alex Meyer (36)Robbie Ray (38)Rafael Montero (41), and Jacob Faria (43) are already ranked quite highly, and outside of Montero and maybe Meyer I could see all of them bumping up even higher. Clevinger’s really only consistency away from being a legitimate stud.

 

My next update will be the end-of-season update, so I think I’m going to do a larger ranking than just the top 25; maybe all the way down to 100. Enjoy the last month-plus!


The Correlation Between BABIP Rate and Three True Outcomes

First things first, I would like to credit my friend Elling Hofland for coming up with the main idea of this piece. He’s the one who provided me with his thoughts and theories that allowed me to expand on this topic in the first place. Give him a follow on Twitter for sports and stats-related banter; his handle is @ellinghofland.

BABIP, or batting average on balls in play, is an incredibly useful stat. It does a fantastic job at using both luck and quality of contact to give a better grasp as to how a player actually performs during batted-ball events. These batted-ball events only take up a certain percentage of a player’s plate appearances. BABIP rate focuses on how many plate appearances a player has relative to the number of batted-ball events they have. To calculate BABIP rate, you take at bats minus strikeouts and home runs, plus sacrifice flies, and divide that by plate appearances. For example, if a player has 600 PA during a single season along with a 300 batted-ball events, they have a BABIP rate of .500.

Now, if you look at the three variables taken out of that equation, you’re left with walks, strikeouts, and home runs, otherwise known as the “three true outcomes.” These are called true outcomes due to the fact that none of them (for the most part) involve defense on the field. A shortstop can’t screw up a strikeout, walk, or a home run. You can take these three true outcomes and turn them into a rate as well. If you add up a player’s strikeouts, walks, and home runs and then divide them by plate appearances, you get TTO rate.

Let’s look at Mike Trout. In 2017, Trout’s BABIP currently sits at .369. However, he has a BABIP rate of .550 along with a TTO rate of .435, meaning that 55% of his at bats end with a ball in play, while 43.5% of his plate appearances result in a strikeout, walk, or home run. Both BABIP rate and TTO rate are useful stats, as they essentially show how well and how often a player makes contact. While BABIP itself is useful, it can be hard to tell how luck is involved in a batted-ball event when it isn’t hit over a fence for a homer. BABIP rate attempts to bridge the gap between BABIP and the three true outcomes.

Miguel Sano is a well-known slugger. In his three seasons in the majors, he’s smashed the ball when he’s hit it, boasting exit velocities of 94.0 in 2015, 92.3 in 2016, and 93.1 in 2017. Despite these consistent EVs, his BABIP has fluctuated from 2015 to 2017, with marks of .396, .329, and .385, respectively. If we look at his BABIP rate from 2015-2017, they look like this: .429, .478, and .473. Despite the difference in his BABIP from 2016 to 2017, his BABIP rate has stayed nearly the same, meaning that he’s still making the same amount of contact with the ball despite fewer balls falling for hit in 2016. Looking solely at BABIP, it could be argued that 2016 was his “regression” to where he should be after sporting an incredibly high BABIP in 2015. In 2017, one could say his high BABIP is a cause for concern, as he may just be getting lucky. However, his BABIP rate shows that isn’t the case.

Let’s look at another player, Brandon Phillips. Phillips’ BABIP has been incredibly consistent during his past three years, sitting at .315 in 2015, .312 in 2016, and .305 in 2017. Additionally, his BABIP rates have been .820, .816 and .802. Phillips puts the ball in play nearly 80% of the time on a regular basis.

So, as you can imagine, there is a real link between BABIP rate and TTO rate. The more contact a player makes, less they tend to walk or strikeout. Thus, a high BABIP rate equals a low TTO rate. This is exactly what we see if we attempt to correlate these two stats. Below is a snapshot of a graph that shows TTO rate vs. BABIP rate.

TTO vs BABIP rate

Players names aren’t included because, A) it clutters the graph, and B) they aren’t necessary at this point. Accompanying this graph is a trend line with an R squared value, otherwise known as a correlation coefficient. Essentially, an R squared value measures how well your model fits your data, or in this case, how closely correlated  TTO and BABIP rate are to each other. It turns out that the R-squared value is .991, which means that the relationship between BABIP rate and TTO rate fit very well together: in fact, you’ll find that TTO rate and BABIP rate are almost the exact opposites of each other. The players with the top 10 lowest BABIP rates in the MLB all have TTO rates of .437 or higher, meaning that their at bats result in an outcome of a walk, home run or strikeout 43.7% of the time. Inversely, players with the lowest BABIP rates all have TTO rates of .225 or lower.

We can also derive more information from these numbers using this correlation. Players who have a low BABIP rate have a very high OPS. Remember, these players also have high TTO rates. The top 10 players, Judge, Sano, K. Davis, Souza Jr., Reynolds, Morrison, J. Upton, C. Santana, Lamb, and Stanton all have an OPS of .841 or higher. The players with the highest BABIP rates (or lowest TTO rates) have an OPS of .798 or lower.

BABIP rate can tell us a lot of about a player. Just by glancing at a player’s BABIP rate, you can have an instant idea of how often the player walks, strikes out, or hits dingers. Not only that, but it you can tell you a lot about their offensive production. High TTO rates usually mean high hard-hit rates along with high exit velocities. BABIP rate also helps understand BABIP itself better and teaches that you can’t judge a player by BABIP all the time. In most cases, players with an over-inflated BABIP (relative to past performances), just tend to mash the absolute heck out of the ball, as told by their low BABIP rates and high TTO rates. On the opposite end, players with a steady BABIP will have very high BABIP rates and tend to be contact hitters that put the ball in play and don’t hit for power. BABIP rate, along with its correlation to TTO rate, has the potential to be a powerful, tell all offensive stat.


Why the Mets Should Call Up Tim Tebow in September

As of August 21st, 2017 Tim Tebow was slashing .220/.304/.343 between the New York Mets’ High-A team, the Columbia Fireflies (South Atlantic League), and their Advanced-A squad, the St. Lucie Mets (Florida State League). In 442 minor-league plate appearances, he is the owner of a .304 wOBA, and is striking out at a 26% clip while walking in 8% of his plate appearances. For every one ball that Tebow elevates, he is hitting the ball on the ground three times over. Right off the bat (pun intended), it is evident that Tebow’s offensive game leaves something to be desired.

Let’s take a quick look at how Tebow stacks up with the average hitter, in each A-ball league, that has had a minimum of 200 plate appearances and has primarily played the same position(s) as Mr. Tebow (outfield & designated hitter):

*Data as of 8/21/2017
Player Age BB% K% AVG OBP SLG OPS wOBA wRC+
Tim Tebow 30 8.8% 26.5% 0.220 0.304 0.343 0.647 0.304 90
Avg. SAL OF/DH 21.5 7.7% 21.9% 0.253 0.322 0.378 0.700 0.322 104
Avg. FSL OF/DH 23 8.2% 21.4% 0.255 0.324 0.370 0.694 0.324 103

Only his walk rate appears to be on par with each respective league’s average. Additionally, Tebow has logged a .913 fielding percentage while playing (primarily) left field this year. It is widely understood that fielding percentage is a “far-from-perfect” measurement when objectifying defensive ability, but it can provide a high-level perspective on one’s aptitude as it relates to fielding the baseball. To put Tebow’s number into context, the lowest fielding percentage in the major leagues this year by an outfielder (minimum 100 innings played) is Mark Canha of the Oakland A’s, at .922.

Many words come to mind when attempting to summarize the 30-year-old’s all-around quality of play while in A-ball; ‘excellent’, ‘incredible’, or ‘promising’ would not be any of those words. However, despite the subpar statistical measuring points, the Mets should seriously consider calling up Tim Tebow to the big leagues come September.

No, that is not a typo. Yes, you read the last sentence of the above paragraph correctly. When rosters expand to include anyone on the 40-man roster on September 1st, the New York Mets should give sincere thought to adding Tim Tebow to their roster/big-league club. Now, why would the New York Mets, a team that owns a 55 – 71 win-loss record and trails the NL Wild Card race by 13.5 games and NL East Division title by 21 games, bother calling up a poorly-performing 30-year-old high-A-ball player? The answer, as it is with many things in life, is money.

Baseball clubs generate revenue in many ways: merchandise sales, concessions sales, corporate sponsorships, media deals, etc. One of the largest and most obvious ways in which income at the major-league club level is generated is through home-park ticket sales. Tim Tebow excels at putting fans in the stands:

YoY Average Home Game Attendance Figures

Year Columbia Fireflies St. Lucie Mets
2016 3,768 1,405
2017 4,783 1,996
YoY % Change 21% 30%

As you can see, both teams that Tebow has played for this year have experienced huge jumps in home attendance figures. This has occurred despite the fact that in 2016 the Columbia Fireflies were celebrating their inaugural season at a brand new stadium, and the St. Lucie Mets were 11 games over .500 in the thick of a playoff race (compared to 11 games under .500 in 2017 at the time of this publication).

As I alluded to above, a lot of circumstances can impact attendance figures: new stadium, weather, promotions, team quality, opponent, etc. However, I think that it’s pretty evident that Tim Tebow’s arrival on the Mets’ minor-league scene has driven a majority of the jump. To confirm this, let’s look at attendance figures from a different angle – specifically, 2017 home attendance numbers and how they vary for each team from when Tebow was actively rostered vs. when he was not:

*Data as of 8/19/2017
Team Tebow Rostered # of Home Games Avg. Home Game Attendance % Change
Columbia Fireflies No 20 3,757
Columbia Fireflies Yes 41 5,308 29%
St. Lucie Mets No 37 1,745
St. Lucie Mets Yes 24 2,419 28%

Again, it’s evident that Tim Tebow’s roster presence has enticed people to come to the home team’s ballpark at a clip nearly 30% greater than if he were not on the team.

So how do we translate these attendance figures into dollars and cents? Since I do not have access to either team’s ticketing database, this is where some assumptions about average per-cap and ticket value will have to come into play. Baseball America’s JJ Cooper & Josh Norris have recently written articles that similarly examine Tebow’s impact at the box office – however, their stories concentrate heavily on road attendance and overall league attendance impacts, rather than the home ballpark’s ticket sales (which are critical to driving a organization’s recognized revenue). In his article, Norris notes that most minor-league operators use a $21 per-cap estimate for fan spending. This figure is an estimate of what each fan that enters the ballpark will have paid in tickets, concessions, merchandise, and parking.

For the first 39 home game dates (41 games due to two doubleheaders) of their 2017 season, the Columbia Fireflies were able to showcase Tim Tebow in uniform. They attracted 207,031 fans. In the first 39 home game dates of their inaugural 2016 season, the Fireflies drew 155,132 fans. The difference between 2017 and 2016 for these first 39 home game dates is 51,899 fans. If we apply the $21 per-cap estimate referenced above, we are looking at about $1.1 million in additional revenue that can be largely attributed to Tebow being in uniform. Tebow’s last game for the Fireflies was on June 25th, his first game for the St. Lucie Mets was on June 28th. Through August 18th, Tebow has been a member of St. Lucie’s roster for 22 home game dates (24 games due to two doubleheaders) and has helped attract 53,207 fans. In 2016, the St. Lucie Mets were able to draw 21,097 during the same stretch. If we apply the $21 per-cap estimate, it will have amounted to $674,310 in additional revenue over the course of the 22 home game dates at this point in the season. Additionally, Tebow has undoubtedly drawn in an abundance of new consumers to each team’s ballparks and databases. This is information that can be leveraged for future sales and marketing initiatives. It would not be ludicrous to state that, combined, the Mets’ A-ball affiliates have increased home-park revenues by roughly $2 million due to Tim Tebow.

Let’s take a hypothetical look at these trends from the 2017 New York Mets point of view. Their current 40-man roster sits at 36 occupants – so there is no risk of having to DFA a player in order to bring on a newcomer. They are far removed from the playoffs, and already have their sights set on next year. Even by adding Tebow to the 40-man roster, they would have three additional spots to work with should they want to expose some of their MLB-ready prospects to low(er)-leverage big-league games in September. The Mets would have to pay Tebow a pro-rated MLB minimum salary, which would come to be about $65K for the final four weeks of the season, pennies compared to what he would bring back in return.

Here is a table of the historical attendance at Citi Field for the month of September since 2010:

Year Citi Field Sept. Attendance # of Games
2010 382,306 14
2011 433,251 16
2012 385,292 16
2013 340,799 15
2014 337,343 13
2015 353,005 11
2016 468,283 14
2017 ? 14

I’ve highlighted 2014 because it most closely resembles the environment that the 2017 Mets will be embarking upon, as you can see below:

*Through 122 games
Year Winning % GB – Division GB – Wild Card Weekday Home Games Weekend Home Games
2014 0.467 10.5 7.5 7 6
2017 0.443 20 12 8 6

You will notice, the 2014 and 2017 Mets were/are both clearly out of the playoff picture and had/have a similar distribution of home games throughout the month of September. Despite one more overall September game in 2017, the 2014 season should prove to be a good starting point for us; because of the extra game, let’s estimate that the Mets will bring in around 339,000 people to Citi Field in September of 2017.

Now, the fun part. How does that audience, and consequentially revenue, project to increase if Tim Tebow were added to the roster? It would be rather difficult to forecast how a marketplace like New York City would react to a move of that nature. There are infinite amounts of variables that could be considered: chilly September temperature and weather volatility, inability to purchase season packages so late in the year, the comparison of the NYC marketplace to that of Columbia, SC and St. Lucie, FL, the matter of the media, the beginning of football season, etc. the list could go on and on. For simplicity’s sake, let’s assume that New York’s market would react in a similar manner as that of Columbia & St. Lucie’s – home attendance gains of near 30%. That would push an additional 102,000 customers through the Citi Field turnstiles during the last four weeks of the season.

The average MLB ticket price in 2016 was $31.00, a 7% increase from the previous year. A 7% increase from the 2016 ticket price would put us just over $33.00 for 2017. This gives us a place to start with regards to estimating revenue impact. I don’t have access to the Mets’ ticketing database, so this barometer will do for the time being. My gut tells me that the $33.00 price point is low; typically season-ticket prices are used when calculating the league-wide annual average ticket price, and season tickets are sold at a discount compared to single-game ticket prices. Being that it is September, most fans that would turn out to see Tebow would be purchasing at the single-game ticket price point (or group-ticket price point, but that complicates things further) since season packages are likely no longer being sold for 2017.

Irrespectively, at this point the math becomes clear: 102,000 additional fans at $33.00/ticket would generate an estimated $3.4 million in surplus revenue. This doesn’t even include the additional revenue that would accrue via a multitude of other outlets. Concessions, merchandise, and parking – all revenue streams that the Mets split with their respective vendors – would experience huge jumps. Strategies to boost season-ticket-holder retention for 2018 (Tim Tebow meet and greet anyone?) would likely yield positive results. As stated before, entirely new ticket buyers would flood into the Mets’ ticketing database — which should boost returns in some form or fashion in future years.

Tim Tebow is not going to play baseball forever. He may choose to call it quits on his “pro-ball quest” after this year. Who’s to say he even wants to go through another year toiling away in the low minor leagues? A promising and young (albeit injury-prone) starting pitching staff should have the Mets within shouting distance of playoff contention for the next couple of years. If that is the case, they will not want to waste an NL roster spot on a subpar, 31-year-old, designated hitter. Roughly $3.5 million should allow the Mets to chase around 0.5 WAR on the open market. It could provide them additional wiggle room to take on extra salary in a deadline trade next year. It would allow the acquisition of players along the likes of Trevor Cahill, Logan Morrison, or Drew Storen…all of whom signed for under $3 million this past offseason. It could be put toward additional infrastructure, baseball analytics, or scouting staff.

Sure, there are certainly more deserving players in the Mets’ minor-league system that have ‘paid their dues’ to a greater extent than Tim Tebow — all in the hopes of getting a call-up to the Show. But baseball is a business, and at the end of the day, no one in the Mets’ system will be able to have an impact on fans the same way that Tim Tebow does/can. The Mets need to capitalize on their current situation before the former Heisman trophy winner tires of the long and uncomfortable bus rides, motel stops, and food spreads that dot the minor-league landscape. The Mets need to cash in on their investment before Tebow bids baseball adieu.