2016 ALCS Game One: Batter vs. Pitcher Stats

The FanGraphs Twitter page tweeted out a bingo card for Game One of the ALCS. As I looked through it, I thought it was a terrific idea by Michelle Jay and a fun way to follow the game that night. I was going to play along, but then I had another idea. Some slots were much more likely to happen, such as the “Pitcher v hitter stats are mentioned” slot. I figured I would let somebody else receive a t-shirt and just count up exactly how many times the TBS broadcast team mentioned batter vs. pitcher stats. We all know announcers love doing this, and we all know that it’s pretty useless for predicting the outcome of that particular at-bat. I just thought it would be cool to experiment and see how many times they actually mentioned these stats.

First, I’ll just go over the final numbers for batter vs. pitcher stats. There were 65 batters in this game, and batter vs. pitcher stats were either mentioned by the announcers or shown on a graphic for eight of those batters.  There were two separate times where they showed a graphic and then mentioned the stats later in the plate appearance, or vice versa. Four of the eight instances occurred when the Jays were hitting against Corey Kluber, three of the eight came when Andrew Miller was pitching, and the last one came when Marco Estrada was on the mound. It’s interesting that they would mention those stats more often when a reliever is pitching, considering the sample size is sure to be even smaller against relievers, rather than starters.

For fun, I marked each occurrence and tried to quickly type out how the announcer mentioned these stats:

  1. Top 1, Josh Donaldson vs. Corey Kluber: “He’s got some pretty good numbers, 6 for 16 with a jack, so he sees him well” -Cal Ripken
  2. Top 1, Russell Martin vs. Corey Kluber: “Martin is only 2 for 10 in his career against Kluber, both home runs…in fact, two of his last seven off Kluber have been home runs” -Ernie Johnson (graphic added later in the plate appearance reading “2 for last 7 off Kluber with 2 HR”
  3. Top 2, Michael Saunders vs. Corey Kluber: “Saunders steps in, he’s 3 for 8 in his career against Kluber, and he fouls it off” -Ernie Johnson
  4. Top 6, Michael Saunders vs. Corey Kluber: “Saunders with his two hits, now 5 for 10 off Kluber” -Ron Darling
  5. Bottom 6, Jason Kipnis vs. Marco Estrada: graphic shown reading “0 for 7 4 K VS ESTRADA”
  6. Top 7, Melvin Upton Jr. vs. Andrew Miller: “Upton’s got some numbers against Miller, 5 for 12 with three home runs” -Ron Darling (“That is some numbers” -Cal Ripken)
  7. Top 8, Edwin Encarnacion vs. Andrew Miller: “Encarnacion in his last six at-bats against Miller a couple of home runs and a double” -Ernie Johnson
  8. Top 8, Jose Bautista vs. Andrew Miller: graphic shown reading “.286 (2 for 7) 1 HR 2 BB VS MILLER” (later in the plate appearance: “One of the two hits that Bautista has off Miller…long ball” -Ron Darling

I’m not trying to knock these announcers by saying that they’re not good at what they do or anything. I would be a terrible announcer. I just think these stats are pretty useless and it was interesting to see how many times they actually mentioned them during a game. Mike Petriello pointed out on Twitter an example of why these numbers aren’t good to look at.

This would be kind of fun to track during the regular season for the really good ones, such as “so and so: 1 for 2 (.500), single career vs. so and so.” Maybe this can be a new metric or something, bpBAAR (batter pitcher Baseball Announcer Above Replacement).


Clustering Pitchers With PITCHf/x

At any point, feel free to scroll down to the bottom to see some of the tables of pitcher clusters.

Clustering Pitches

Clustering individual pitches using data from PITCHf/x is a fairly simple task. All you need to do is pick out the important attributes that you believe define a pitch (velocity, movement, etc.) and use a clustering algorithm, such as K-Means clustering.

With K-Means clustering, you decide what K (the number of clusters) should be. For my analysis, I chose K to be 500 (rather arbitrarily). Different pitch clusters can represent the same type of pitch (i.e. fastball) but with varying attributes. For example, clusters 50 and 100 might both correspond to fastballs, but cluster 50 might be a typical Chris Young fastball whereas cluster 100 might be a typical Aroldis Chapman fastball.

One important point to remember is that you, the analyst, must decide what the clusters represent. By looking at attributes of the pitches in a given cluster, you might identity the cluster as “lefty changeups” or “submariner fastballs” (which is actually a category you will discover).

The Problem of Clustering Pitchers

We can identify every pitch that a pitcher throws as belonging to a cluster from 1 to 500. Therefore, we know the distribution of pitch clusters for a given pitcher. The difficult problem, however, is how do we compare two pitchers using this information? Let’s say we have two pitchers:

  • Pitcher A’s pitches are 50% from cluster 1 and 50% from cluster 200.
  • Pitcher B’s pitches are 33% from cluster 1, 33% from cluster 300, and 33% from cluster 139.

The question remains, are Pitcher A and Pitcher B similar pitchers?

The problem of clustering pitchers is a more complicated one than clustering pitches because we now have a collection of pitches instead of just individual pitches to compare. In order to cluster pitchers, I use a model that is typically used for topic modeling called Latent Dirichlet Allocation (LDA).

An Aside on LDA

In LDA for topic modeling, our data is a collection of documents.

Let’s imagine that our collection of documents is articles from the New York Times. There are global topics that govern how these articles are generated. For example, if you think of a newspaper, the topics might be sports, finance, health, politics, etc. Additionally, each article can be a mixture of these topics. We might imagine there is an article in the sports section titled, “Yankees payroll exceeds $300 million”, which our algorithm may discover is 50% about sports and 50% about finance.

Similar to what is mentioned above, the analyst must figure out what the topics actually are. You do not tell the algorithm that there is a sports topic. You discover that the topic is sports by observing that the most probable words are “baseball”, “Jeter”, “LeBron”, “touchdown”, etc. The algorithm will tell you that a particular document is 50% about topic 1 and 50% about topic 20, but you must ultimately infer what topics 1 and topics 20 are.

I am harping on this point mainly just to mention that there is no magic to these clustering algorithms. An algorithm can cluster data, but it cannot tell you what these clusters mean.

Relevance of LDA to Pitchers

Anyway, how can this model be used to analyze pitchers? We just need to use our imagination. Instead of a collection of documents, we now have a collection of pitcher seasons. Whereas each document is made up of a collection of words, each pitcher season is made up of a collection of pitches. We have already discretized each pitch using K-Means clustering in order to create our own “dictionary” of pitches. In our baseball model, we imagine that each pitcher is a mixture of repertoires, whereas in topic modeling, each document was a mixture of topics. We can then cluster pitchers together by figuring out who has the most similar repertoires.

Nitty Gritty Details

If you are not interested in getting into the nitty gritty details, feel free to skip ahead to the next section to just see the cluster groupings.

  • Data used is from 2007-2014.
  • The dictionary of pitches (500 clusters) was created by running K-Means using all of the pitches from 2014. The choice of 2014 is arbitrary, but I used just one year’s worth of data because I thought it might be a sufficient amount and it was much quicker to run K-Means.
  • The PITCHf/x attributes that were used to cluster pitches were start_speed, pfx_x/pfx_z (horizontal/vertical movement), px/pz (horizontal/vertical location), vx0/vz0 (components of velocity).
  • For each pitcher from 2007-2014, each pitch was assigned to its closest cluster (determined by distance to the cluster center). I filtered out pitcher seasons in which the pitcher threw fewer than 500 pitches.
  • I then ran LDA on pitcher seasons, choosing the number of repertoires (topics) to be 5.
  • I used the method from this paper to get a vector representation of each pitcher season. I could have used the inferred repertoire proportions as my vector representations, but for various reasons, this did not produce as nice of clusters.
  • Finally, I ran K-Means (K=100) on these vectors to get clusters of pitchers.
  • Whereas in topic modeling, it is often interesting to interpret what the global topics actually are, I am not really interested in what the global “repertoires” are for the model. I am really using LDA as a dimensionality reduction technique to produce smaller vectors (5 vs. 500) that can be clustered together.

Some Observations

The actual clusters along with some relevant FanGraphs statistics are provided below. Each table is sortable. For brevity, I have only included clusters in which there are 10 or fewer pitchers. Only the first cluster shown (cluster 3) has more than 10 pitchers, which I simply included to demonstrate that a cluster could be quite big.

  • As is probably expected, clusters are almost always entirely righties or lefties even though this is not an input to the model.
  • Guys with similar numbers of batters faced cluster together. This is by design, as the way I determined the repertoire proportions accounts for the number of times a particular pitch is thrown.
  • Sometimes weird clusters can form, such as Cluster 37, which contains both Chapman and Wakefield. Cluster 37 is mostly cohesive with hard-throwing left-handers and I believe Wakefield ends up here simply because he did not fit well into any cluster.
  • This is not to say that the algorithm cannot find clusters of knuckleballers. Cluster 14 is all R.A. Dickey from years 2011-2014.
  • There are also other clusters that contain exclusively one (or almost one) pitcher. Cluster 8 is 5 Kershaw years and one Hamels year. Cluster 68 is 5 Verlander years. I believe these clusters form partially because their stuff is so good. There are other pitchers who fall into almost exclusively one cluster but who are joined by many other pitchers. Another factor is that they might be able to repeat their mechanics so well that they remain in the same cluster because they are always throwing the same pitch types.
  • Clusters of individual pitchers also happens if a pitcher has an incredibly unique style. Justin Masterson has his own cluster because he is such an extreme ground-ball pitcher. Josh Collmenter does as well due to the extreme rise he generates on his “fastball”.
  • Cluster 29 contains just Kershaw’s 2014 season and J.A. Happ’s 2009 season. If you do a Ctrl-F for J.A. Happ, he finds himself in some pretty flattering clusters. This is especially interesting because from 2007-2014, he does not have particularly good seasons, but he has been quite good the last two years. This is not to suggest that these clusters can uncover hidden gems, but it’s not fully out of the realm of possibility.
  • Most clusters produce quite similar ground-ball percentages. One of the factors that goes into clustering pitches (and therefore pitchers) is horizontal and vertical movement, which play a huge factor in a pitcher’s ability to produce ground-balls.
  • Submarine pitchers always end up together. Check out Clusters 9, 60, and 92.

Overall, I think this is pretty interesting stuff. I was honestly surprised that the clusters turned out to be as cohesive as they were. Additionally, besides being a descriptive tool, I have to wonder whether this information can be used for predictive purposes. For example, we often talk about regression to the mean when discussing a player’s performance, whether it be a pitcher of a batter. It is possible that the appropriate mean for many pitchers is the cluster mean that they happen to fall into.

Cluster 3

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Chris Carpenter Cardinals 750 6.73 1.78 0.33 55.0 28.0 4.6 5.5
2010 Hiroki Kuroda Dodgers 810 7.29 2.20 0.69 51.1 32.1 8.0 4.3
2010 Gavin Floyd White Sox 798 7.25 2.79 0.67 49.9 32.1 7.6 4.1
2008 Hiroki Kuroda Dodgers 776 5.69 2.06 0.64 51.3 28.6 7.6 3.6
2012 Doug Fister Tigers 673 7.63 2.06 0.84 51.0 26.7 11.6 3.4
2011 Josh Beckett Red Sox 767 8.16 2.42 0.98 40.1 42.2 9.6 3.3
2011 Michael Pineda Mariners 696 9.11 2.89 0.95 36.3 44.8 9.0 3.2
2012 A.J. Burnett Pirates 851 8.01 2.76 0.80 56.9 24.3 12.7 3.0
2013 Rick Porcello Tigers 736 7.22 2.14 0.92 55.3 23.7 14.1 2.9
2008 Carlos Zambrano Cubs 796 6.20 3.43 0.86 47.2 34.9 9.0 2.8
2013 Andrew Cashner Padres 707 6.58 2.42 0.62 52.5 28.7 8.1 2.7
2012 Jeff Samardzija Cubs 723 9.27 2.89 1.03 44.6 33.1 12.8 2.7
2010 Scott Baker Twins 725 7.82 2.27 1.22 35.6 43.5 10.2 2.6
2014 Kyle Gibson Twins 757 5.37 2.86 0.60 54.4 26.6 7.8 2.3
2012 Tim Hudson Braves 749 5.13 2.41 0.60 55.5 25.2 8.3 2.1
2014 Henderson Alvarez Marlins 772 5.34 1.59 0.67 53.8 24.3 9.5 2.1
2008 Todd Wellemeyer Cardinals 807 6.29 2.91 1.17 39.3 39.8 10.6 2.0
2010 Rick Porcello Tigers 700 4.65 2.10 1.00 50.3 32.1 9.9 1.7
2011 Luke Hochevar Royals 835 5.82 2.82 1.05 49.8 32.2 11.5 1.7
2008 Jason Marquis Cubs 738 4.90 3.77 0.81 47.6 32.5 8.3 1.7
2014 Charlie Morton Pirates 666 7.21 3.26 0.51 55.7 22.8 8.8 1.6
2012 Luis Mendoza Royals 709 5.64 3.20 0.81 52.1 27.1 10.6 1.5
2009 Aaron Cook Rockies 675 4.44 2.68 1.08 56.5 24.7 14.2 1.4
2014 Doug Fister Nationals 662 5.38 1.32 0.99 48.9 34.2 10.1 1.4
2010 Mitch Talbot Indians 696 4.97 3.90 0.73 47.8 35.3 7.0 1.2
2008 Armando Galarraga Tigers 746 6.35 3.07 1.41 43.5 39.7 13.0 1.2
2008 Carlos Silva Mariners 689 4.05 1.88 1.17 44.0 33.3 10.4 1.2
2009 Ross Ohlendorf Pirates 725 5.55 2.70 1.27 40.6 42.1 11.1 1.2
2008 Vicente Padilla Rangers 757 6.68 3.42 1.37 42.7 38.1 12.5 1.1
2012 Luke Hochevar Royals 800 6.99 2.96 1.31 43.3 35.0 13.5 1.1
2012 Derek Lowe – – – 640 3.47 3.22 0.63 59.2 21.0 9.1 1.0
2013 Edinson Volquez – – – 777 7.50 4.07 1.00 47.6 29.6 11.9 0.9
2011 Chris Volstad Marlins 719 6.36 2.66 1.25 52.3 27.7 15.5 0.7
2010 Jeremy Bonderman Tigers 754 5.89 3.16 1.32 44.7 39.2 11.4 0.7
2010 Brad Bergesen Orioles 746 4.29 2.70 1.38 48.7 36.6 11.9 0.6
2014 Hector Noesi – – – 733 6.42 2.92 1.46 38.0 40.6 12.7 0.3
2009 Armando Galarraga Tigers 642 5.95 4.20 1.50 39.9 38.6 13.3 0.2
2008 Kyle Kendrick Phillies 722 3.93 3.30 1.33 44.3 28.7 14.0 0.1
2014 Roberto Hernandez – – – 722 5.74 3.99 1.04 49.7 29.9 12.2 0.0
2013 Lucas Harrell Astros 707 5.21 5.15 1.17 51.5 27.4 14.3 -0.8

 

Cluster 5

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Cliff Lee – – – 843 7.84 0.76 0.68 41.9 40.4 6.3 7.0
2011 Cliff Lee Phillies 920 9.21 1.62 0.70 46.3 32.4 9.0 6.8
2009 Jon Lester Red Sox 843 9.96 2.83 0.89 47.7 34.5 10.6 5.3
2014 Jose Quintana White Sox 830 8.00 2.34 0.45 44.7 33.2 5.1 5.1
2013 Derek Holland Rangers 894 7.99 2.70 0.85 40.8 36.4 8.8 4.3
2012 Matt Moore Rays 759 8.88 4.11 0.91 37.4 42.9 8.6 2.7
2013 Wade Miley Diamondbacks 847 6.53 2.93 0.93 52.0 27.2 12.5 1.8

 

Cluster 6

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 CC Sabathia Indians 975 7.80 1.38 0.75 45.0 36.6 7.8 6.4
2014 Jake McGee Rays 274 11.36 2.02 0.25 38.0 42.9 2.9 2.6
2014 Tyler Matzek Rockies 503 6.96 3.37 0.69 49.7 30.3 8.3 1.7
2013 J.A. Happ Blue Jays 415 7.48 4.37 0.97 36.5 46.0 7.6 1.1
2010 J.A. Happ – – – 374 7.21 4.84 0.82 39.0 43.4 7.4 1.0
2009 Sean West Marlins 467 6.10 3.83 0.96 40.2 40.8 8.0 1.0
2009 Andrew Miller Marlins 366 6.64 4.84 0.79 48.0 30.0 9.3 0.7
2012 Drew Pomeranz Rockies 434 7.73 4.28 1.30 43.9 35.9 13.6 0.7
2013 Jake McGee Rays 260 10.77 3.16 1.15 42.5 38.8 12.9 0.6
2008 Jo-Jo Reyes Braves 512 6.21 4.14 1.43 48.5 31.8 15.5 0.2

 

Cluster 8

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Clayton Kershaw Dodgers 908 8.85 1.98 0.42 46.0 31.3 5.8 7.1
2011 Clayton Kershaw Dodgers 912 9.57 2.08 0.58 43.2 38.6 6.7 7.1
2012 Clayton Kershaw Dodgers 901 9.05 2.49 0.63 46.9 34.0 8.1 5.9
2010 Clayton Kershaw Dodgers 848 9.34 3.57 0.57 40.1 42.1 5.8 4.7
2009 Clayton Kershaw Dodgers 701 9.74 4.79 0.37 39.4 41.6 4.1 4.4
2010 Cole Hamels Phillies 856 9.10 2.63 1.12 45.4 37.9 12.3 3.5

 

Cluster 9

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Peter Moylan Braves 309 7.52 4.32 0.00 62.4 19.5 0.0 1.4
2014 Joe Smith Angels 285 8.20 1.81 0.48 59.1 25.9 8.0 1.0
2011 Joe Smith Indians 267 6.04 2.82 0.13 56.6 23.5 2.2 1.0
2009 Brad Ziegler Athletics 313 6.63 3.44 0.25 62.3 19.7 4.4 1.0
2013 Brad Ziegler Diamondbacks 297 5.42 2.71 0.37 70.4 10.8 12.5 0.6
2012 Brad Ziegler Diamondbacks 263 5.50 2.75 0.26 75.5 7.7 13.3 0.6
2012 Joe Smith Indians 278 7.12 3.36 0.54 58.0 24.9 8.3 0.6
2008 Cla Meredith Padres 302 6.27 3.07 0.77 66.8 17.3 15.8 0.3
2010 Peter Moylan Braves 271 7.35 5.23 0.71 67.8 21.3 13.5 -0.3

 

Cluster 14

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 R.A. Dickey Mets 927 8.86 2.08 0.92 46.1 34.1 11.3 5.0
2011 R.A. Dickey Mets 876 5.78 2.33 0.78 50.8 32.9 8.3 2.5
2014 R.A. Dickey Blue Jays 914 7.22 3.09 1.09 42.0 37.6 10.7 1.7
2013 R.A. Dickey Blue Jays 943 7.09 2.84 1.40 40.3 40.5 12.7 1.7

 

Cluster 16

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Max Scherzer Tigers 836 10.08 2.35 0.76 36.3 44.6 7.6 6.1
2014 Max Scherzer Tigers 904 10.29 2.57 0.74 36.7 41.6 7.5 5.2
2011 Daniel Hudson Diamondbacks 921 6.85 2.03 0.69 41.7 39.1 6.4 4.6
2012 Max Scherzer Tigers 787 11.08 2.88 1.10 36.5 41.5 11.6 4.4
2014 Jeff Samardzija – – – 879 8.28 1.76 0.82 50.2 30.5 10.6 4.1
2014 Lance Lynn Cardinals 866 8.00 3.18 0.57 44.3 36.0 6.1 3.4

 

Cluster 18

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Brandon Webb Diamondbacks 944 7.27 2.58 0.52 64.4 20.4 9.6 5.5
2013 Justin Masterson Indians 803 9.09 3.54 0.61 58.0 24.2 10.7 3.5
2012 Justin Masterson Indians 906 6.94 3.84 0.79 55.7 25.0 11.4 2.3
2011 Derek Lowe Braves 830 6.59 3.37 0.67 59.0 22.5 10.2 2.1

 

Cluster 20

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 John Danks White Sox 878 6.85 2.96 0.76 45.4 38.9 7.4 4.4
2010 Brian Matusz Orioles 760 7.33 3.23 0.97 36.2 45.0 7.9 3.0
2009 John Danks White Sox 839 6.69 3.28 1.26 44.2 40.9 11.5 2.7
2013 Felix Doubront Red Sox 705 7.71 3.94 0.72 45.6 34.4 7.8 2.2
2014 J.A. Happ Blue Jays 673 7.58 2.91 1.25 40.6 39.5 11.5 1.0

 

Cluster 24

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 CC Sabathia – – – 1023 8.93 2.10 0.68 46.6 31.7 8.8 7.3
2011 CC Sabathia Yankees 985 8.72 2.31 0.64 46.6 30.3 8.4 6.4
2010 David Price Rays 861 8.11 3.41 0.65 43.7 39.6 6.5 4.2

 

Cluster 29

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Clayton Kershaw Dodgers 749 10.85 1.41 0.41 51.8 29.2 6.6 7.6
2009 J.A. Happ Phillies 685 6.45 3.04 1.08 38.4 42.9 9.5 1.7

 

Cluster 35

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Chris Young Mariners 688 5.89 3.27 1.42 22.3 58.7 8.8 0.1
2014 Marco Estrada Brewers 624 7.59 2.63 1.73 32.7 49.5 13.2 -0.1

 

Cluster 36

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Justin Masterson Indians 908 6.58 2.71 0.46 55.1 26.7 6.3 4.2
2010 Justin Masterson Indians 802 7.00 3.65 0.70 59.9 24.9 10.0 2.3

 

Cluster 37

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Aroldis Chapman Reds 276 15.32 2.89 0.50 37.3 42.9 7.4 3.3
2009 Matt Thornton White Sox 291 10.82 2.49 0.62 46.4 36.3 7.7 2.3
2008 Matt Thornton White Sox 268 10.29 2.54 0.67 53.0 27.4 10.9 1.7
2012 Drew Smyly Tigers 416 8.52 2.99 1.09 39.9 41.3 10.3 1.7
2008 Clayton Kershaw Dodgers 470 8.36 4.35 0.92 48.0 31.3 11.6 1.5
2008 Tim Wakefield Red Sox 754 5.82 2.98 1.24 35.5 48.9 9.1 1.1
2011 Tim Wakefield Red Sox 677 5.41 2.73 1.45 38.4 45.8 10.5 0.2

 

Cluster 38

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Cliff Lee Phillies 876 8.97 1.29 0.89 44.3 33.3 10.9 5.5
2008 Johan Santana Mets 964 7.91 2.42 0.88 41.2 36.4 9.4 5.3
2010 Jon Lester Red Sox 861 9.74 3.59 0.61 53.6 29.6 8.9 4.8
2012 CC Sabathia Yankees 833 8.87 1.98 0.99 48.2 30.7 12.5 4.7
2008 Jon Lester Red Sox 874 6.50 2.82 0.60 47.5 31.6 7.0 4.1
2013 Hyun-Jin Ryu Dodgers 783 7.22 2.30 0.70 50.6 30.5 8.7 3.6
2014 Wei-Yin Chen Orioles 772 6.59 1.70 1.11 41.0 37.5 10.5 2.4
2010 Jonathan Sanchez Giants 812 9.54 4.47 0.98 41.5 43.7 9.8 2.3
2014 Wade Miley Diamondbacks 866 8.18 3.35 1.03 51.1 28.0 13.9 1.6

 

Cluster 44

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Cole Hamels Phillies 850 8.08 1.83 0.79 52.3 32.6 9.9 4.9
2008 Cole Hamels Phillies 914 7.76 2.10 1.11 39.5 38.7 11.2 4.8
2008 John Danks White Sox 804 7.34 2.63 0.69 42.8 35.4 7.4 4.8
2009 Cole Hamels Phillies 814 7.81 2.00 1.12 40.4 38.7 10.7 3.9
2014 Danny Duffy Royals 606 6.81 3.19 0.72 35.8 46.0 6.1 1.9
2011 J.A. Happ Astros 698 7.71 4.78 1.21 33.0 44.2 10.2 0.6

 

Cluster 46

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Roy Halladay Phillies 993 7.86 1.08 0.86 51.2 29.7 11.3 6.1
2013 Lance Lynn Cardinals 856 8.84 3.39 0.62 43.1 34.4 7.4 3.7
2008 Mike Pelfrey Mets 851 4.93 2.87 0.54 49.6 29.6 6.3 3.1
2009 A.J. Burnett Yankees 896 8.48 4.22 1.09 42.8 39.2 10.8 3.0
2010 Roberto Hernandez Indians 880 5.31 3.08 0.73 55.6 30.8 8.3 2.6
2009 Derek Lowe Braves 855 5.13 2.91 0.74 56.3 25.8 9.4 2.5
2010 Derek Lowe Braves 824 6.32 2.83 0.84 58.8 22.6 13.1 2.2

 

Cluster 49

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Aroldis Chapman Reds 202 17.67 4.00 0.17 43.5 34.8 4.2 2.8
2014 James Paxton Mariners 303 7.18 3.53 0.36 54.8 22.6 6.4 1.2
2013 Rex Brothers Rockies 281 10.16 4.81 0.67 48.8 32.5 9.3 0.9
2012 Antonio Bastardo Phillies 224 14.02 4.50 1.21 27.7 50.0 12.5 0.8
2012 Tim Collins Royals 295 12.01 4.39 1.03 40.9 42.8 11.8 0.7
2012 Christian Friedrich Rockies 377 7.87 3.19 1.49 42.2 34.6 15.4 0.7
2013 Justin Wilson Pirates 295 7.21 3.42 0.49 53.0 30.0 6.7 0.6
2011 Aroldis Chapman Reds 207 12.78 7.38 0.36 52.7 30.8 7.1 0.5
2014 Justin Wilson Pirates 256 9.15 4.50 0.60 51.3 34.4 7.3 0.2
2011 Mike Dunn Marlins 267 9.71 4.43 1.29 38.5 46.0 12.2 -0.2

 

Cluster 51

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Cliff Lee – – – 969 7.03 1.67 0.66 41.3 36.5 6.5 6.3
2009 CC Sabathia Yankees 938 7.71 2.62 0.70 42.9 37.3 7.4 5.9
2010 CC Sabathia Yankees 970 7.46 2.80 0.76 50.7 34.1 8.6 5.1

 

Cluster 54

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Hisashi Iwakuma Mariners 709 7.74 1.06 1.01 50.2 28.7 13.2 3.1
2009 Justin Masterson – – – 568 8.28 4.18 0.84 53.6 31.4 10.4 1.5
2014 Justin Masterson – – – 592 8.11 4.83 0.84 58.2 21.6 14.6 0.4

 

Cluster 58

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 David Price – – – 1009 9.82 1.38 0.91 41.2 38.1 9.7 6.0
2014 Jon Lester – – – 885 9.01 1.97 0.66 42.4 37.0 7.2 5.6
2012 Gio Gonzalez Nationals 822 9.35 3.43 0.41 48.2 30.0 5.8 5.0
2011 David Price Rays 918 8.75 2.53 0.88 44.3 36.9 9.7 4.4
2013 Gio Gonzalez Nationals 819 8.83 3.50 0.78 43.9 33.3 9.7 3.2
2011 Gio Gonzalez Athletics 864 8.78 4.05 0.76 47.5 34.1 8.9 3.1
2010 Gio Gonzalez Athletics 851 7.67 4.13 0.67 49.3 35.3 7.4 3.1

 

Cluster 60

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Brad Ziegler – – – 239 6.79 2.93 0.00 68.6 13.4 0.0 1.0
2007 Cla Meredith Padres 342 6.67 1.92 0.68 72.0 13.6 17.1 1.0
2008 Brad Ziegler Athletics 229 4.53 3.32 0.30 64.7 18.8 6.3 0.5
2013 Joe Smith Indians 259 7.71 3.29 0.71 49.1 30.1 9.6 0.5
2008 Chad Bradford – – – 241 2.58 2.28 0.46 66.5 16.0 9.4 0.4
2012 Cody Eppley Yankees 194 6.26 3.33 0.59 60.3 19.1 11.1 0.3
2008 Joe Smith Mets 271 7.39 4.41 0.57 62.6 17.9 12.5 0.3
2009 Cla Meredith – – – 283 5.10 3.44 0.55 62.9 21.1 8.9 0.2
2010 Brad Ziegler Athletics 257 6.08 4.15 0.59 54.4 26.9 8.2 0.1
2014 Brad Ziegler Diamondbacks 281 7.25 3.22 0.67 63.8 18.9 13.5 0.1

 

Cluster 68

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Justin Verlander Tigers 982 10.09 2.36 0.75 36.0 42.8 7.4 7.7
2012 Justin Verlander Tigers 956 9.03 2.27 0.72 42.3 35.6 8.3 6.8
2011 Justin Verlander Tigers 969 8.96 2.04 0.86 40.2 42.1 8.8 6.4
2010 Justin Verlander Tigers 925 8.79 2.85 0.56 41.0 40.3 5.6 6.3
2013 Justin Verlander Tigers 925 8.95 3.09 0.78 38.4 38.9 7.8 4.9

 

Cluster 69

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Manny Parra Brewers 741 7.97 4.07 0.98 51.6 26.6 13.5 2.3
2014 Drew Smyly – – – 618 7.82 2.47 1.06 36.6 43.4 9.5 2.2
2012 J.A. Happ – – – 627 8.96 3.48 1.18 44.0 38.9 11.9 1.9

 

Cluster 70

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Gerrit Cole Pirates 571 9.00 2.61 0.72 49.2 31.8 9.4 2.3
2009 Luke Hochevar Royals 631 6.67 2.90 1.45 46.6 35.8 13.8 1.0
2012 Joe Kelly Cardinals 457 6.31 3.03 0.84 51.7 27.5 11.0 0.9
2008 Sidney Ponson – – – 612 3.85 3.18 0.93 54.5 26.2 10.9 0.9
2013 Joe Kelly Cardinals 532 5.73 3.19 0.73 51.1 28.2 8.9 0.7
2009 Roberto Hernandez Indians 596 5.67 5.03 1.15 55.2 27.0 13.7 0.0

 

Cluster 71

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Chris Young Padres 434 8.18 4.22 1.14 21.7 53.4 8.7 1.4
2012 Chris Young Mets 493 6.26 2.82 1.25 22.3 58.2 7.7 1.2
2013 Josh Collmenter Diamondbacks 384 8.32 3.23 0.78 32.7 46.8 6.9 1.0
2012 Josh Collmenter Diamondbacks 375 7.97 2.19 1.30 37.4 43.1 11.5 0.8
2009 Chris Young Padres 336 5.92 4.74 1.42 30.2 51.7 10.0 0.0

 

Cluster 72

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Madison Bumgarner Giants 873 9.07 1.78 0.87 44.4 35.8 10.0 4.0
2013 Jon Lester Red Sox 903 7.47 2.83 0.80 45.0 35.4 8.3 3.5

 

Cluster 77

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Josh Collmenter Diamondbacks 621 5.83 1.63 0.99 33.3 47.0 7.7 2.3
2014 Josh Collmenter Diamondbacks 719 5.77 1.96 0.90 38.8 39.9 8.3 1.9

 

Cluster 78

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 Rich Hill Cubs 812 8.45 2.91 1.25 36.0 42.9 11.7 3.1
2014 Tyler Skaggs Angels 464 6.85 2.39 0.72 50.1 30.9 8.7 1.5
2011 Danny Duffy Royals 474 7.43 4.36 1.28 37.5 40.3 11.5 0.5
2010 Manny Parra Brewers 560 9.52 4.65 1.33 47.2 34.5 14.8 0.3

 

Cluster 79

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 David Price Rays 836 8.74 2.52 0.68 53.1 27.0 10.5 5.0
2011 C.J. Wilson Rangers 915 8.30 2.98 0.64 49.3 31.9 8.2 4.9
2010 C.J. Wilson Rangers 850 7.50 4.10 0.44 49.2 33.5 5.3 4.1
2013 C.J. Wilson Angels 913 7.97 3.60 0.64 44.4 33.4 7.2 3.2
2012 Madison Bumgarner Giants 849 8.25 2.12 0.99 47.9 33.3 11.7 3.1
2011 Derek Holland Rangers 843 7.36 3.05 1.00 46.4 33.6 11.0 3.0
2012 Wandy Rodriguez – – – 875 6.08 2.45 0.92 48.0 31.6 10.1 2.5
2014 Jason Vargas Royals 790 6.16 1.97 0.91 38.3 38.7 8.2 2.2
2012 C.J. Wilson Angels 865 7.70 4.05 0.85 50.3 29.9 10.8 2.2

 

Cluster 85

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Cliff Lee Phillies 847 8.83 1.19 1.11 45.0 36.9 11.8 5.0
2014 Cole Hamels Phillies 829 8.71 2.59 0.62 46.4 31.1 8.2 4.3
2009 Wandy Rodriguez Astros 849 8.45 2.76 0.92 44.9 37.1 9.9 4.1
2012 Wade Miley Diamondbacks 807 6.66 1.71 0.65 43.3 33.7 6.9 4.1
2013 Jose Quintana White Sox 832 7.38 2.52 1.03 42.5 37.4 10.2 3.5
2009 Andy Pettitte Yankees 834 6.84 3.51 0.92 42.9 37.8 8.9 3.4
2012 Wei-Yin Chen Orioles 818 7.19 2.66 1.35 37.1 42.1 11.7 2.3

 

Cluster 86

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Josh Beckett Red Sox 883 8.43 2.33 1.06 47.2 31.7 12.8 4.2
2010 Max Scherzer Tigers 800 8.46 3.22 0.92 40.3 40.0 9.6 3.7
2014 Nathan Eovaldi Marlins 854 6.40 1.94 0.63 44.8 32.9 6.6 2.9
2012 Lucas Harrell Astros 827 6.51 3.62 0.60 57.2 22.5 9.7 2.8
2013 Jeff Samardzija Cubs 914 9.01 3.29 1.05 48.2 31.4 13.3 2.7
2011 Max Scherzer Tigers 833 8.03 2.58 1.34 40.3 39.5 12.6 2.2
2009 Mike Pelfrey Mets 824 5.22 3.22 0.88 51.3 30.0 9.5 1.7
2011 Roberto Hernandez Indians 833 5.20 2.86 1.05 54.8 26.6 13.0 0.9

 

Cluster 92

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Steve Cishek Marlins 275 11.57 2.89 0.41 42.7 31.1 5.9 2.0
2007 Sean Green Mariners 304 7.01 4.50 0.26 60.9 18.8 5.1 0.7
2008 Sean Green Mariners 358 7.06 4.10 0.34 63.3 19.5 6.1 0.7
2011 Shawn Camp Blue Jays 292 4.34 2.98 0.41 53.5 25.7 5.2 0.3
2010 Shawn Camp Blue Jays 298 5.72 2.24 1.00 52.0 31.4 11.1 0.2

 

Cluster 95

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Cliff Lee Indians 891 6.85 1.37 0.48 45.9 35.1 5.1 6.7
2012 Cole Hamels Phillies 867 9.03 2.17 1.00 43.4 35.1 11.9 4.6
2013 Cole Hamels Phillies 905 8.26 2.05 0.86 42.7 36.7 9.1 4.5
2008 Scott Kazmir Rays 641 9.81 4.14 1.36 30.8 48.9 12.0 2.0

 

Cluster 97

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Jered Weaver Angels 926 7.56 2.14 0.76 32.5 48.6 6.3 5.7
2009 Jered Weaver Angels 882 7.42 2.82 1.11 30.9 50.4 8.3 3.9
2014 Chris Tillman Orioles 871 6.51 2.86 0.91 40.6 39.3 8.3 2.3
2009 Joe Blanton Phillies 837 7.51 2.72 1.38 40.6 39.5 12.9 2.2
2013 Chris Tillman Orioles 845 7.81 2.97 1.44 38.6 39.8 14.2 1.9

 


A Year In xISO

For the type of baseball fan I’ve become — one who follows the sport as a whole rather than focuses on a particular team — 2016 was the season of Statcast. Even for those who watch the hometown team’s broadcast on a nightly basis, exit velocity and launch angle have probably become familiar terms. While Statcast was around last season, it seems fans and commentators alike have really embraced it in 2016.

Personally, I commend MLB for democratizing Statcast data, at least partially, especially when they are under no apparent obligation to do so. I’ve enjoyed the Statcast Podcast this season, but most of all, I’ve benefited from the tools available at Baseball Savant. For it is that tool which has allowed me to explore xISO. I first introduced an attempt to incorporate exit velocity into a player’s expected isolated slugging (xISO). I subsequently updated the model and discussed some notable first half players. Alex Chamberlain was kind enough to include my version of xISO in the RotoGraphs x-stats Omnibus, and I’ve been maintaining a daily updated xISO resource ever since.

Happily for science, all of my 2016 first half “Overperformers” saw ISO declines in the second half, while most of my first half “Underperformers” saw large drops in second half playing time. Rather than focus on individuals, though, let’s try to estimate the predictive value of xISO in 2016.

Yuck. This plot shows how well first-half ISO predicted second-half ISO, compared to how well first-half xISO predicted the same, for 2016 first AND second-half qualified hitters. Both of these are calculated using the model as it was at the All-Star break. There are two takeaways: First-half ISO was a pretty bad predictor of second-half ISO, and first-half xISO was also a pretty bad predictor of second-half ISO. Mercifully though, first-half xISO was a bit better than ISO at predicting future ISO. This is consistent with the findings in my first article, and a basic requirement I set out to satisfy.

Now, an interesting thing happened recently. After weeks of hinting, Mike Petriello unveiled “Barrels”. Put simply, Barrels are meant to be a classification of the best kind of batted balls. Shortly thereafter, Baseball Savant began tabulating total Barrels, Barrels per batted ball (Brls/BBE), and Barrels per plate appearance (Brls/PA). In a way, this is similar to Andrew Perpetua’s approach to using granular batted-ball data to track expected outcomes for each batted ball, except that the Statcast folks have taken only a slice of launch angles and exit velocities to report as Barrels.

By definition, these angles and velocities are those for which the expected slugging percentage is over 1.500, so it would appear that this stat could be a direct replacement for my xISO. Not so fast! First of all, because ISO is on a per at-bat (AB) basis, we definitely need to calculate Brls/AB from Brls/PA. This is not so hard if we export a quick FanGraphs leaderboard. Let’s check how well Brls/AB works in a single-predictor linear model for ISO:

Not too bad. The plot reports both R-squared and adjusted R-squared, for comparison with multiple regression models. I won’t show it, but this is almost exactly the coefficient of determination that my original xISO achieves with the same training data. I still notice a hint of nonlinearity, and I bet we can do better.

Hey now, that’s nice. In terms of adjusted R-squared, we’ve picked up about 0.06, which is not insignificant. The correlation plot also looks better to my eye. So what did I do? As is my way, I added a second-order term, and sprinkled in FB% and GB% as predictors. The latter two are perhaps controversial inclusions. FB% and/or GB% might be suspected to be strongly correlated with Brls/AB, introducing some undesired multicollinearity. While I won’t show the plots, it doesn’t actually turn out to be a big problem in this case. Both FB% and GB% have Pearson correlation coefficients close to 0.5 with Brls/AB (negative correlation in the case of GB%). Here’s the functional form of the multiple regression model plotted above, which was trained on all 2016 qualified hitters:

To be honest, there is something about my first model that I liked better. This version, using Barrels, feels like a bit of a half-measure between Andrew Perpetua’s bucketed approach and my previous philosophy of using only average exit-velocity values and batted-ball mix. My original intent was to create a metric that could be easily calculated from readily available resources, so in that sense, I’m still succeeding. Going forward, I will be calculating both versions on my spreadsheet. I’m excited to see which version serves the community better heading into 2017!

As always, I’m happy to entertain comments, questions, or criticisms.


Did the Cubs and Giants Have the Best Pitcher-Hitting Series Ever?

With a wild comeback in Game 4 on Tuesday night, the Cubs secured their spot in the NLCS for the second straight season. Considering where the team was just five years ago, this is obviously an impressive achievement. But maybe more impressive is how they reached that second consecutive NLCS. The Cubs scored 17 runs against the Giants in their NLDS showdown, and six of those were driven in by their pitchers! That’s an absurd 35% of the Cubs’ run output coming from the guys who usually do the run prevention.

When Travis Wood hit his incredible home run as a relief pitcher in Game 2, it was the first postseason home run from a pitcher since Joe Blanton took Edwin Jackson deep in Game 4 of the 2008 World Series, and the first postseason home run from a reliever since 1924.

When Jake Arrieta left the yard in the first inning of the very next game, it became the first postseason series with multiple home runs off the bats of pitchers since the 1968 World Series, when Mickey Lolich and Bob Gibson each went deep in a seven-game series. Of course, Lolich and Gibson were rivals, not teammates, making the Wood-Arrieta accomplishment even more impressive — and rare. In fact, it was only the second time in the history of baseball (per Baseball-Reference Play Index) that two pitchers, on the same team, hit home runs in the same series. The only other time with in the 1924 World Series, when New York Giant teammates, and pitchers, Jack Bentley and Rosy Ryan homered in Games 3 and 5 of the epic seven-game series. Wood and Arrieta were the only ones to do so in back-to-back games.

* * *

Now, it wasn’t just the Cubs pitchers getting in on the fun. For a while Tuesday night, it looked as though Giants starter, Matt Moore, was going to be a two-fold hero. Shutting down the Cubs offense from the mound, and knocking in the first run of the game for the Giants in the bottom of the fourth. While that was the only hit from Giants pitchers in the series, it was still enough to set the combined hitting totals for the two teams to: .250 batting average, with a .625 slugging percentage, while knocking in 23 percent of the total runs scored.

Those are some pretty crazy totals, but are they the best ever?

Using the aforementioned Play Index search of all-time postseason home runs from pitchers, there are 18 different series (including the 2016 NLDS) in which a pitcher homered. In those series, on three occasions, the pitcher who hit the home run was the only pitcher to get a hit in the entire series (1984 Rick Sutcliffe, 1978 Steve Carlton, 1975 Don Gullet). Only twice did pitchers combine for more than the 10 total bases from the Giants and Cubs, and only once did they drive in more than the seven runs (and they never topped the percent of runs driven in). Let’s go to the chart:

Top Team Pitcher Performances in the Playoffs

Year Hits AB BA TB SLG RBI Series runs % of RBI
2016 NLDS 4 16 0.250 10 0.625 7 30 23.33
2008 WS 2 13 0.154 5 0.385 1 39 2.56
2006 NLCS 2 25 0.080 5 0.200 1 55 1.82
2003 NLCS 3 28 0.107 6 0.214 3 82 3.66
1984 NLCS 4 17 0.235 7 0.412 1 48 2.08
1978 NLCS 2 17 0.118 5 0.294 4 38 10.53
1975 NLCS 2 12 0.167 5 0.417 3 26 11.54
1974 WS 4 20 0.200 8 0.400 1 27 3.70
1970 WS 2 25 0.080 5 0.200 4 53 7.55
1970 ALCS 5 18 0.278 10 0.556 6 37 16.22
1969 WS 5 26 0.192 10 0.385 5 24 20.83
1968 WS 5 36 0.139 11 0.306 4 63 6.35
1967 WS 2 30 0.067 8 0.267 2 46 4.35
1965 WS 5 32 0.156 9 0.281 6 44 13.64
1958 WS 7 37 0.189 10 0.270 8 54 14.81
1940 WS 3 39 0.077 7 0.179 2 50 4.00
1926 WS 4 39 0.103 8 0.205 2 52 3.85
1924 WS 8 42 0.190 14 0.333 5 53 9.43
1920 WS 6 39 0.154 9 0.231 3 29 10.34

After a brief peruse, it’s clear that there are only a few cases in which the pitchers in a series can even come close to what we just saw. Let’s take a look at the five best, in ascending order:

1968 World Series

This was one of the three series before the 2016 NLDS in which multiple pitchers hit home runs. In 1968, it was, as noted above, Bob Gibson and Mickey Lolich who homered in the series, one each for the Cardinals and Tigers. The reason this series is in fifth in the challengers to Cubs-Giants is because those two pitchers were really it. They drove in the only four runs from pitchers in the series (three of the four RBI coming on the two home-run swings), and there was only hit to hit come from a non-Gibson/Lolich pitcher.

1969 World Series

Just a year after our first entry into this challenge, the Mets and Orioles played in the first World Series to be led off with a League Championship Series. The extra-long season didn’t stop the Mets and Orioles pitchers from contributing all over the diamond, however, as they crammed five hits, 10 total bases, and five RBI into just a five-game series. Because of the abbreviated length of the series, this is one of the few series that can challenge the 2016 NLDS in terms of percentages. That being said, the Cubs-Giants pitchers take all three percentage categories, leaving there no real room for debate on this one.

1958 World Series

The 1958 series stands out in that it was the highest RBI total for pitchers in any postseason series to date. That was thanks in large part to top two pitchers for the Braves, Warren Spahn and Lew Burdette, tallying three RBI apiece. Burdette did it with the long ball, while Spahn preferred the death-by-a-thousand-cuts method, tallying his three RBI on four hits in the series. The Yankees got two RBI of their own from Bob Turley, but I’m not quite willing to give these guys the edge over the Cubs-Giants pitchers. The easiest argument for this year’s NLDS is that the Cubs-Giants pitchers tallied as many total bases and only one less RBI in three fewer games, as the 1958 World Series went to seven games, while this year’s NLDS went just four games.

1924 World Series

Here’s where the challenge gets real stiff. The 1924 World Series is the other series in which we have two home runs from pitchers, the aforementioned Bentley and Ryan teammates for the Giants. This series tops our charts in hits (8) and total bases (14), and is a reasonable choice for best-hitting series from a group of pitchers. I’m still giving the edge to Cubs-Giants in this showdown, though, and for a couple of reasons. Actually, really one reason with a couple different explanations: opportunity. Similar to the 1958 World Series, the 1924 World Series went to seven games, meaning that pitchers had far more games to rack up those hits and total bases. Pitchers were also left in games far longer in the 1920s, and as such, tallied almost three times as many at bats as the 2016 NLDS pitchers. When comparing batting average (.250 to .190) and, even more so, slugging percentage (.625 to .333) it becomes clear that this year’s Cubs-Giants pitchers still reign supreme.

1970 ALCS

Here’s our winner. The only series that I believe tops the recently concluded Cubs-Giants NLDS in terms of output from pitchers at the plate. This was an even shorter series than Cubs-Giants, as the Orioles only needed three games to dispatch the Twins. And their pitchers were a good chunk of the reason why. The Orioles used just four pitchers in the series, but all four got hits, combining for all of the offense you see above. (Twins pitchers were 0-for-5 in the series.) Not only did all four get hits, but all three starters got extra-base hits, as Dave McNally, Jim Palmer, and Mike Cuellar (Dick Hall was the reliever) all showed what they were capable of on the other side of the ball. Of course, the very next season, these three starters, along with Pat Dobson, would form just the second-ever set of four 20-game winners on the same team, proving just how awesome the late `60s and early `70s Orioles really were. They reign supreme for now, but let’s see how those Cubs starting pitchers do for the rest of the 2016 playoffs.


Let’s Get the Twins to the World Series

Imagine for a second that MLB Commissioner Rob Manfred has gone senile. I know that’s a ridiculous premise, and this is sure to be a ridiculous post, but bear with me. Commissioner Manfred, perhaps after a long night of choice MLB-sponsored adult beverages, has placed the Minnesota Twins in the playoffs. Yes, the same Twins of the .364 win percentage and facial hair promotional days. What is the probability that they make or win the World Series? For simplicity, let’s say they take the place of both AL Wild Card teams and are just inserted into the divisional playoffs.

We are going to look at a bunch of ways of estimating the probability the Twins win a five-game series or a seven-game series, then multiply our results accordingly to find an estimate for the team reaching each round. We’ll start simply, and gradually progress to more complicated methods of estimation. Let’s start as simply as possible, then, and use the Twins’ .364 win percentage.  The probability of the Twins winning a five-game series (at least three out of five games) is 25.7%. The same process gives them a 22.4% chance of winning a seven-game series. Multiplying these out gives the Twins a 5.8% chance of reaching the World Series (roughly 1 in 17) and a 1.3% chance of winning it. For reference, those are nearly the same odds FanGraphs gave the Mets of reaching/winning the World Series on October 2nd. Of course, those Mets also had to get through the Wild Card round (and the greatest frat boy to ever pitch a playoff game), but failed to do so.

Okay, so maybe you didn’t like that method because we included the Twins’ entire regular season, instead of just including games against playoff teams. Noted, but just understand that the Twins had basically the same win percentage against playoff teams (.365) as their overall percentage. Just to note, I defined playoff teams as the six division winners plus the four wild card teams. Using the Twins’ percentage against playoff teams yields identical probabilities as above.

How else can we attack this problem? Well, the Twins played 162 games this year, which means they have 158 different five-game stretches and 156 seven-game stretches. Over all those five-game rolling “series”, the Twins won at least three games 24.1% of the time, and they won at least four games in 25% of their seven-game tilts. Multiplying those figures out gives them a 6% chance of reaching the World Series and a 1.5% chance of becoming world champs.

Again, those numbers are unsatisfying because they include all teams, not just the playoff teams. However, removing the non-playoff teams leaves us with a bit of a sample issue because they played 52 games against playoff teams. So, let’s change the problem slightly: what is the probability that a last-place team can reach, and win, the World Series? The teams I’ll be considering all finished in last in their respective divisions: Twins, Athletics, Rays, Braves, Reds, and Padres. Cumulatively, these teams had a win percentage of .412, won 37.4% of their games against playoff teams, won at least three games in 30.6% of their five-game stretches, and won at least four out of seven 29.9% of the time. You can multiply these percentages out and get some answers.

I’m still not satisfied, so there is one more tool I’m gonna break out: a bootstrap simulation. Bootstrapping basically means sampling with replacement, which means every time I randomly choose a game from the sample, that game is thrown back in and has the same exact chance of getting picked again. This resampling with replacement process gives the bootstrap some pretty useful properties that I won’t get into here, but you can check here for more info.

I’m going to put all the games the last-place teams played against playoff teams into a pile. I’m going to randomly sample five games from that pile, with replacement, and count how many games were wins. I’m going to do this 100,000 times. I will then divide the number of samples that included at least three wins by the total number of samples, giving me an estimated probability of these last-place teams winning a five-game series against a playoff team. I will repeat this process for a seven-game series.

The bootstrap probability of a last-place team winning a five-game series against a playoff team was 27%. The probability of them winning a seven-game series was 24%. They have a 6.5% chance of reaching the World Series and 1.6% chance of winning it.

Honestly, these probabilities are lower than I expected. I have believed in and learned to embrace the randomness of the MLB postseason. I went into this post expecting the outcome to highlight just how random the postseason really is, even absurdly so. However, the randomness of the postseason really depends on the extremely small differences between all the teams at the top, so inserting teams from the very bottom of the league introduces a level of certainty that would be new to the playoffs. However, imagine repeating a similar exercise for the NFL or NBA. The 27% or so chance I’d give the Twins of advancing seems much higher than the probability of, say, the Cleveland Browns winning a playoff game if inserted into the postseason.

My methodology was clearly very simple, but intentionally so. I gave no acknowledgement to a home-field advantage adjustment, and I looked only at the team’s W-L record. A more complex method could have taken into consideration Pythagorean Expectation or BaseRuns.

This was a ridiculous post and ultimately a meaningless exercise. The Twins probably couldn’t reach the World Series if they were placed in the playoffs, but I’ll point out that as of this writing (October 10th during Game 3 of Nationals-Dodgers) the Cubs also probably won’t reach the World Series. Baseball is a weird and wonderful sport, and the postseason is the weirdest and most wonderful time of the year. If the Twins could conceivably reach the World Series as currently constructed, don’t think too hard about what’s happening and just enjoy.


53 Things About a 53-Second Finnish Baseball Video

With no baseball being played on this Monday night as I write this, I thought I’d throw this out for a quick fix.  Granted, this is baseball as it’s played in Finland:

 

Below is a second-by-second recap of all the glorious action.

{note – because the Stone-Age author doesn’t know how to post GIFs into an article, you’ll have to pause the video yourself to freeze the action for each of the 53 seconds}

0:01 – Dude in the white-striped uniform way off the plate, obviously trying to avoid catcher’s interference because of the dude in the orange-and-blue uniform.

0:02 – Orange-and-blue apparently spots the pitcher striding towards the pitcher’s mound, which I guess in Finnish is the “tikli”.

0:03 – There’s a “ski” on the back of the hitter’s jersey, so he must be Sami Haapakoski.  Not likely to be another Polish guy on a Finnish baseball team.

0:04 – And he’s got his hands backwards.  (I’d love to see how he holds a light bulb to screw it in)

0:05 – And now the catcher flips the ball up in the air!  A combination hidden-ball trick/quick-pitch.

0:06 – First baseman charging in…Sami charging at the offering, which can only mean…

0:07 – A line drive over the first baseman’s head.  Well played Sami!

0:08 – Sami now runs down the THIRD-BASE LINE!!!! (being half-Polish myself I have no more capacity to joke).  This means that the runner who’s already there (Jeano Segurannen) has to start running to second.

0:09 – What’s with the water hazard inside the park?  I guess with this being Finnish baseball, they’ve replaced right field with a right fjord.

0:10 – I like the greenery in right fjord.  Gives it a Wrigley-like ambiance (this is the Obligatory 2016 Cubs Reference™ for this article)

0:11 – Crowd going wild, screaming for Sami to run the bases the right way and not blow a well-earned ground-rule double.

0:12 – Or maybe it’s a ground-rule triple if it gets stuck in the poison ivy.  Not sure.

0:13 – Love the hustle on the guy in right fjord.  Plays the game the right way, he does.

0:14 – And emerging from behind a tree there’s an umpire, checking to see if the ball lodged in the poison ivy for a triple or into the water for a double….what, the ball’s IN PLAY??!?

0:15 – Yep. The right fjorder (Jonni Damonen) swiftly tosses a relay to one of his fellow outfjorders.

0:16 – Unfortunately, Ryän Raburninnen isn’t known for having the best “handle” in this sport

0:17 – Average water temperatures in Finland are colder than anywhere in the continental USA.  That’s because they’re measured in degrees Celsius.

0:18 – Look, there’s Jeano rounding the bases the right way

0:19 – Poor right fjorder takes his second plunge in the last five seconds.  Someone please fire up a sauna for ol’ Jonni.

0:20 – And there’s Sami flying like a Finn right behind him.  All this fumbling of the frigid fjord-frozen ball in right fjord has allowed them to finally move forward again.

0:21 – Nice flip by the right fjorder.  Maybe they should move him to second base, wherever the hell they put that in Finland.

0:22 – Nice use of the split screen for the fielding and baserunning portions of the play.  Might catch on for MLB telecasts if they ever tried it.

0:23 – Here comes Sami to his jubilant teammates….

0:24 – …PSYCH!!…

0:25 – …running up the third-base line without him

0:26 – The right fjorder pulls his hypothermic body up Tallinn’s Hill, his efforts having been to no avail.

0:27 – Why are they running out there with their bats?  I am so thoroughly confused.

0:28 – Led Zeppelin, the official sponsor of the third-base warning track.

0:29 – Those uniforms make these guys look like a NASCAR pit crew.  Waiting for one of them to hand Sami a champagne bottle to spray the place.

0:30 – Some guy in a blue jacket is taking a stroll in from left field, apparently oblivious to all the mayhem.

0:31 – This part of the field is also used for the Finnish Capture The Flag League.

0:32 – Finnish vodka is excellent.  Just ask the camera guy.

0:33 – Guy in blue jacket has a helmet on.  Must be from a different pit crew.

0:34 – Ebullient Finnish yelling.

0:35 – This part of the field was formerly used by the local Finnish Basketball Association team.  The team disbanded once it was discovered that someone forgot to put up an actual basket.

0:36 – The one guy with a green helmet comes towards the camera with his bat in ready position.  Must be the team’s enforcer.

0:37 – “HAYYYYY!!!”

0:38 – Another yell sounding like “BASEBALLLL!!!!”

0:39 – Coach about to give Sami a water bottle for all his efforts with the bat and on the basepaths (both clockwise and counterclockwise)

0:40 – Fun fact: one of those long Finnish words on Sami’s uni means “this space available for sale”.  I forgot exactly which one it was.

0:41 – At least Sami holds the water bottle correctly.

0:42 – How come there’s no left fjord?

0:43 – Fuzzy blue feet can only mean one thing — a mascot!  Wonder who/what they have for mascots in Finland?

0:44 – It’s the love child of these two!  Sweet!

0:45 – Not sure what that thing is over the bleachers behind home plate (home Frisbee?).  Looks vaguely aerodynamic.

0:46 – Someone obviously has a job that includes coordinating handtowels to these guys’ uniforms.  The age of specialization is not merely a North American phenomenon.

0:47 – Because Finnish baseballs are often contaminated with fjord-borne bacteria, used handtowels are the souvenir of choice.

0:48 – Eriko is like… what?

0:49 – Ignoring the two kids waving for the towel in the front, Sami fires a Hail Mary pass for the blonde in the top row.

0:50 – Notice all the parkas and heavy winter clothing on these fans.  Although the average game-time temperature in Finland is about 17°C, the temperature on this evening was only 10°C, which is just 10 degrees above the freezing point of the right fjorder’s uniform.

0:51 – Nobody bothered to man the lemonade stand in left field just past the bleachers.  Guy in the blue jacket probably just walked off with the lemons.

0:52 – Can the Finnish president override a vimpelin veto?

0:53 – Fun fact:  the official logo of Superpesis, the major league of Finnish baseball, has basically the same logo as the NBC peacock.

Thank you for watching, and have a nice day.


Hardball Retrospective – What Might Have Been – The “Original” 2002 Blue Jays

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 2002 Toronto Blue Jays 

OWAR: 51.4     OWS: 312     OPW%: .572     (93-69)

AWAR: 34.2      AWS: 234     APW%: .481     (78-84)

WARdiff: 17.2                        WSdiff: 78  

The 2002 “Original” Blue Jays breezed to the American League East title, vanquishing the Yankees by a nine-game margin. Toronto topped the American League in OWAR and OWS. Shawn Green (.285/42/114) registered 110 tallies, achieved his second All-Star appearance and finished fifth in the MVP balloting. Jeff Kent (.313/37/108) drilled 42 doubles and attained a career-high in home runs. Carlos Delgado belted 33 round-trippers and coaxed 102 bases on balls. John Olerud (.300/22/102) laced 39 two-base hits and collected the Gold Glove Award. In the midst of five straight seasons with a batting average above .300, Shannon Stewart sliced 38 doubles and scored 103 runs. Vernon Wells reached the century mark in RBI and added 34 two-base knocks in his first full season. The “Actual” squad featured 2002 AL Rookie of the Year Eric Hinske (.279/24/84) at the hot corner.

Jeff Kent placed forty-eighth among second-sackers in the “The New Bill James Historical Baseball Abstract” top 100 player rankings while John Olerud secured the 53rd slot at first base.

Original 2002 Blue Jays                            Actual 2002 Blue Jays

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS AWAR AWS
Shannon Stewart LF 2.37 18.47 Shannon Stewart LF 2.37 18.47
Vernon Wells CF 0.83 16.7 Vernon Wells CF 0.83 16.7
Shawn Green RF 6.18 32.07 Jose L. Cruz RF/LF 1.73 12.62
John Olerud DH/1B 4.64 25.92 Josh Phelps DH 1.46 9.8
Carlos Delgado 1B 4.76 25.97 Carlos Delgado 1B 4.76 25.97
Jeff Kent 2B 6.04 29.93 Dave Berg 2B 0.18 8.61
Alex S. Gonzalez SS 2.78 14.36 Chris Woodward SS 2.17 11.74
Chris Stynes 3B -0.02 3.46 Eric Hinske 3B 3.8 21.81
Greg Myers C 0.57 5.57 Tom Wilson C 0.43 5.88
BENCH POS OWAR OWS BENCH POS AWAR AWS
Jay Gibbons RF 0.59 11.97 Raul Mondesi RF 0.08 6.33
Chris Woodward SS 2.17 11.74 Orlando Hudson 2B 1.17 5.89
Craig A. Wilson RF 0.95 10.78 Felipe Lopez SS 0.08 5.8
Michael Young 2B -0.63 10.72 Ken Huckaby C -1.24 1.78
Josh Phelps DH 1.46 9.8 Joe Lawrence 2B -0.83 1.48
Orlando Hudson 2B 1.17 5.89 Dewayne Wise RF -0.42 1.39
Felipe Lopez SS 0.08 5.8 Jayson Werth RF 0.04 0.77
Brent Abernathy 2B -0.44 4.99 Homer Bush 2B -0.27 0.75
Abraham Nunez 2B 0.04 4.88 Darrin Fletcher C -0.44 0.64
Cesar Izturis SS -0.68 3.77 Brian Lesher 1B -0.5 0.23
Ryan Thompson LF 0.14 2.84 Kevin Cash C -0.14 0.08
Joe Lawrence 2B -0.83 1.48 Pedro Swann DH -0.18 0
Pat Borders DH 0.06 0.36
Mike Coolbaugh 3B -0.17 0.16
Casey Blake 3B -0.11 0.11
Kevin Cash C -0.14 0.08

Roy “Doc” Halladay (19-7, 2.93) warranted his first All-Star invitation and led the American League with 239.1 innings pitched. David “Boomer” Wells compiled 19 victories with a 3.75 ERA. Toronto’s superb bullpen staff was anchored by Billy Koch (3.27, 44 SV) and Jose Mesa (2.97, 45 SV). The setup corps consisted of Steve Karsay (3.26, 12 SV), Ben Weber (7-2, 2.54) and Kelvim Escobar (4.27, 38 SV).

Original 2002 Blue Jays                          Actual 2002 Blue Jays

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Roy Halladay SP 6.74 21.67 Roy Halladay SP 6.74 21.67
David Wells SP 3.99 14.79 Pete Walker SP 1.85 8.74
Woody Williams SP 3.2 9.65 Mark Hendrickson SP 1.23 4.01
Gary Glover SP 0.03 4.54 Esteban Loaiza SP -0.15 3.86
Mark Hendrickson SP 1.23 4.01 Justin Miller SP -0.23 3.4
BULLPEN POS OWAR OWS BULLPEN POS AWAR AWS
Billy Koch RP 1.44 18.37 Kelvim Escobar RP 0.53 9.14
Jose Mesa RP 1.28 12.4 Cliff Politte RP 1.05 6.49
Steve Karsay RP 2.01 11 Corey Thurman RP 0.54 3.66
Ben Weber RP 1.33 10.48 Felix Heredia RP 0.09 3.12
Kelvim Escobar RP 0.53 9.14 Scott Eyre RP 0.11 2.83
Mike Timlin RP 1 8.04 Chris Carpenter SP 0.41 2.73
Giovanni Carrara RP 0.62 6.77 Steve Parris SP 0 1.88
David Weathers RP 1.02 6.68 Scott Cassidy RP -0.43 1.67
Chris Carpenter SP 0.41 2.73 Dan Plesac RP 0.33 1.39
Graeme Lloyd RP -0.53 1.89 Brian Bowles RP 0.04 1.37
Scott Cassidy RP -0.43 1.67 Jason Kershner RP 0.12 0.65
Jose Silva RP 0.11 1.38 Pedro Borbon RP -0.07 0.48
Brian Bowles RP 0.04 1.37 Scott Wiggins RP 0.05 0.2
Mark Lukasiewicz RP 0 1.17 Pasqual Coco RP -0.13 0
Jim Mann RP 0.18 1.02 Brian Cooper SP -0.59 0
Carlos Almanzar SW 0.24 0.94 Bob File RP -0.47 0
Tom Davey RP -0.36 0.17 Brandon Lyon SP -0.56 0
Pasqual Coco RP -0.13 0 Luke Prokopec SP -0.91 0
Bob File RP -0.47 0 Mike Smith SP -0.45 0
Pat Hentgen SP -0.54 0
Brandon Lyon SP -0.56 0
Aaron Small RP -0.08 0
Mike Smith SP -0.45 0
Todd Stottlemyre SP -0.38 0

Notable Transactions

Shawn Green 

November 8, 1999: Traded by the Toronto Blue Jays with Jorge Nunez (minors) to the Los Angeles Dodgers for Pedro Borbon and Raul Mondesi. 

Jeff Kent 

August 27, 1992: Traded by the Toronto Blue Jays with a player to be named later to the New York Mets for David Cone. The Toronto Blue Jays sent Ryan Thompson (September 1, 1992) to the New York Mets to complete the trade.

July 29, 1996: Traded by the New York Mets with Jose Vizcaino to the Cleveland Indians for Carlos Baerga and Alvaro Espinoza.

November 13, 1996: Traded by the Cleveland Indians with a player to be named later, Julian Tavarez and Jose Vizcaino to the San Francisco Giants for a player to be named later and Matt Williams. The Cleveland Indians sent Joe Roa (December 16, 1996) to the San Francisco Giants to complete the trade. The San Francisco Giants sent Trent Hubbard (December 16, 1996) to the Cleveland Indians to complete the trade. 

John Olerud 

December 20, 1996: Traded by the Toronto Blue Jays with cash to the New York Mets for Robert Person.

October 27, 1997: Granted Free Agency.

November 24, 1997: Signed as a Free Agent with the New York Mets.

October 29, 1999: Granted Free Agency.

December 15, 1999: Signed as a Free Agent with the Seattle Mariners. 

Billy Koch

December 7, 2001: Traded by the Toronto Blue Jays to the Oakland Athletics for Eric Hinske and Justin Miller.

Honorable Mention

The 1995 Toronto Blue Jays 

OWAR: 27.1     OWS: 208     OPW%: .469     (76-86)

AWAR: 25.4       AWS: 168      APW%: .389    (56-88)

WARdiff: 1.7                        WSdiff: 40

The “Original” ’95 Jays plodded to a fourth-place finish in the AL East, eleven games behind the Orioles while the horrific “Actuals” placed 30 games behind the Red Sox. David Wells delivered a 16-8 record with a 3.24 ERA and made his first appearance at the Mid-Summer Classic. Jose Mesa (1.13, 46 SV) blossomed in the closer’s role, meriting second place in the Cy Young Award balloting along with a fourth-place finish in the MVP race. Derek Bell pilfered 27 bases and established personal-bests in BA (.334) and OBP (.385). Fellow outfielder Glenallen Hill clubbed 24 long balls and set career-highs with 86 RBI and 25 stolen bases. Geronimo Berroa clubbed 22 taters and knocked in 88 runs. Jeff Kent contributed 20 dingers and John Olerud socked 32 doubles.

On Deck

What Might Have Been – The “Original” 1902 Cubs

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

 


Evaluating the Career of Hanley Ramirez

Hanley Ramirez first came up with the Red Sox in 2005, had two plate appearances, and then was dished to the Marlins.  He officially started his regular career in 2006, and didn’t look back for the next five years.  He has often been credited for the many tools that he has or had: speed, hitting for average, and hitting for power.  But rarely has he been credited for doing all at the same time.  This article is to show you, the reader, exactly how rare Hanley Ramirez has been, and how to appreciate him correctly.

Since he came up to the major leagues in 2006 with the Marlins, Hanley Ramirez has wowed us with his skill.  In the early stages of his career, he was a young shortstop with amazing speed, good hit skill, and pop in his bat.  In that rookie season, he hit a solid .292 with an unexpected 17 home runs, and, most surprising of all, he notched 51 stolen bases.  He skipped the dreaded sophomore slump in his next big-league campaign, matching his previous total of 51 swiped bags, while improving almost everything in his stats.  He hit an amazing 29 home runs in 2007, while knocking in 81 runs and accumulating an impressive 5.2 WAR.  The most impressive part about that 2007 season, though, was his amazing .332 batting average.  

At this point in his career, many analysts and fans predicted that this would represent his regular prime stats — and what outstanding stats they were.  Yet it was not to be.  If believable, he got even better the next year, upping his homer total to 33, and improving both his walk rate and his ISO.  In addition, he raised his WAR to an astonishing 7.5.  Somehow, he did all this while dropping his BABIP 24 points, to ‘only’ .329, and stealing 16 less bases than in the previous year.  In his fourth year in the major leagues, his homer total along with that of his stolen bases dropped to below the 30 mark, but his average leaped up 40 points to .341!  His WRC+ also climbed 5 points to 149.  A less amazing year followed in 2010, but he was still impressive, hitting at a .300 clip with 21 homers and a 4.2 WAR.  

In 2011, he ended his streak of incredible campaigns, hitting for only a .243 average with a paltry 10 home runs.  In his first year as a veteran in the major leagues, Hanley picked his homer total up to 24, but his average remained below .260.  Overall, it was a pretty dismal two-year span for Hanley.  He rebounded spectacularly the next year, though, hitting .345 with 20 homers for a new team, the Dodgers.  Unfortunately, Hanley’s homer total dropped to 13 in 2014, but he kept his average up to .288.  He also drove in 71 runs that year, making the year not a complete failure.  

He didn’t keep up his good streak for long, though.  In 2015, with the Red Sox, his average dropped back down to .249, while hitting 19 home runs.  Coming into 2016, Hanley must have tweaked something in his approach, because he had his first solid year in a long while.  With everything complete, he had 30 homers and 111 runs batted in with a .286 average.  That is a comeback.  It’s crazy, though, when looking at the journey he’s been through in the big leagues.  He’s hit for power, has stolen bases, and accumulated 7+ WAR — twice!  He did all this at the plate while playing the middle infield, corner outfield, and corner infield.

So now that the whole length and breadth of Hanley’s career has been touched upon, there is now a base on which his career can be evaluated.  Starting, of course, from the year he came up, it’s obvious from the overview above that Hanley was spectacular.  It’s certainly not normal for a player of his youth (he was 23 when he broke into the majors) to be successful upon immediate entry into the premier baseball league in the world.  So when looking at his statistics from that first year, it’s not too surprising to see that his BABIP in that first year was an unrealistically high .343.  That could mean many things.  The first thought that comes to mind when seeing a BABIP that high is “an extreme overdose of luck.” However, a whole season (700 plate appearances) is long enough that luck would wear off after less than half the season went by.  The luck theory seems even more ludicrous when looking at the next four years of his career.  In those four years, he averaged a BABIP of .345.  

There is another well-documented theory that may be applicable to Hanley’s situation.  He could, like Paul Goldschmidt, have been hitting so many line drives that such a high BABIP is easily achieved.  However, this theory is disproved when his average line-drive percentage is seen.  He averaged a line-drive percentage around 19 percent, compared to Goldschmidt’s idealistic 24 percent.  

This is not a case when the easy way out is taken, and it’s just said “that’s just who he is, he just hits for a high BABIP!”  Indeed, it is not who Hanley is: after those five years, his BABIP dropped to just .275 and .290 for two years afterward.  Thankfully, this question is easily solved by a very simple answer, one that might have slipped through the cracks of many a research team.  Such easy an answer suffices, in a day when complicated statistical analysis-based answers are some of the only answers accepted.  This is one of the few cases in which all statistical-analysis answers are proven to be insufficient, so an old tool is called upon in place of them.  

Simply put, the answer is speed.  For the first five years of his career, Hanley had unbelievable speed, evidenced by his 196 stolen bases in that span.  Of course, speed has a bigger factor than just the occasional slow roller between first and second that was beaten out through pure speed.  Speed means the opposing team pulling in their third baseman in case of a bunt, or pulling in the whole infield so the speedster doesn’t get that aforementioned infield hit (both of these scenarios would result in an easier opportunity to get a hit, because it’s extremely hard to stop a hard-hit ball when fielders are pulled into within 75 feet of home plate).  Speed means getting hittable pitches, so one is not walked, and therefore given a chance to steal a base.  

This theory of speed makes even more sense when it’s seen that as soon as Hanley’s speed began to diminish, he stopped getting a high BABIP.  His lack of speed in the 2011 and 2012 seasons affected his whole offensive output in that span.  In those two years,  he hit for an average of .250, and stole only (for him) 41 bases during those two seasons.  His rebound the next year (.345 avg., .363 BABIP) was due in large part to an uncharacteristic line-drive percentage of 22 percent, and a hard-hit percentage close to 50.  His horrible season in 2015 was most likely because of many reasons.  During that year, he had almost no remaining speed, a chronic inability to hit the ball hard, and an array of injuries.  However, he rebounded this year, accumulating 30 homers while hitting a solid .286.  

How he did that, it’s hard to know.  He barely improved his line-drive and hard-hit percentages, and certainly did not suddenly gain speed.  It’s now safe to say that somehow, someway, Hanley has completely revamped his approach to hitting.  Now that his speed is gone for good, he still is managing to stay extremely productive while not utilizing his speed to make his stats great.  Of course, he’s not even close to being as productive as he was during that five-year stretch, but he has managed to do what almost no speedster has done in the past: stay productive after the age of 31, when speed starts to diminish.  Many a speedster has fallen prey to this ailment called aging, including (but not limited to) Vince Coleman, Carl Crawford, and Scott Podsednik.  Of course, there are many exceptions, mainly Rickey Henderson and Ichiro Suzuki.  So Hanley has joined an elite club, one that definitely does not fit his style of play.

Over his career, Hanley has proven to be able to hit for power, average, and line drives, while also running well for a while.  Out of the five tools in baseball, three are for hitting.  Hanley could be the image of each, from different points in his career.  

Speed: he had two straight 51-steal seasons.  

Average: he has a .295 career average over his 11 year tenure in the major leagues.

Power: he’s accumulated seven seasons of 20+ home runs.  

He truly is and has been one of the most talented players in the major leagues.  Despite this, Hanley remains to be one of the most underappreciated players in the major leagues.  Not many players have done what he has done in his career, yet he is viewed as a good comeback player, not as the personification of the tools in hitting.


Defense Is Cheap — and It Wins

One of the most common phrases in all of sports is “defense wins championships.” Defense isn’t flashy; it doesn’t put people in the seats (unless you’re a desperate Twins fan wanting to see Byron Buxton do more of this — or this). People like to see the home runs, the strikeouts. People also like to see the diving plays, but diving plays are a poor indicator of a team’s total defensive quality. So even the plays on defense that do put people in the seats aren’t indicative of a team’s overall level of defense. Other sports are the same way. People don’t realize the ins and outs of NBA defenses; they only see the steals and the lockdown plays — or lack thereof. NFL fans love to see big hits, but sometimes these big hits could be avoided if a team had defended the play better and stopped the ball carrier earlier.

Yes, it is true the nuances of defense can be monotonous, and this is true through all sports. Another factor about defense is the lack of a way to quantify defensive skill. Some metrics, like RPM (shameless plug to my boy Ricky Rubio, clearly a top-5 PG), try to do this for basketball. But in baseball, defense really is quantifiable, using different metrics that track can track how effective a defensive player or team is against league average. For example, read up on UZR, just one of the metrics that can put a number on a defense.

I came to this thinking on the undervaluation of defense through a different path. I had always wondered if an incredible defense could bail out an average pitching staff. I had always been interested in this facet; to reminisce, I once created an outfield of Torii Hunter, Rocco Baldelli, and Carl Crawford on MVP Baseball 2004. These were the best and fastest fielders in the game, and it seemed like they could get any fly ball. As much as I want to credit EA Sports for making an accurate game, I obviously cannot deduce the real-world effectiveness from a video game. Instead, I turned to the numbers.

To quantify how much a defense could “bail out” their pitching staff, I looked at the team’s average ERA compared to its average FIP. The difference between these numbers can somewhat quantify how much a team’s defense (and other factors) influence pitching from what we would expect it to be. For example, if a team had a FIP of 4.00, and an ERA of 3.50, this would indicate that a good defense was able to reach more balls than an average defense, meaning the team’s ERA should be lower, as there were more recorded outs than what we expect. The opposite, a team’s ERA being greater than its FIP, would indicate that a poor defense hurt their pitching staff’s performance, as they should have been able to get more balls that they did. To sum up, my hypothesis was that the teams with the largest FIP-ERA differences had great defenses, while teams that had the lowest FIP-ERA differences (negative values), had poor defenses. Now, I understand that many factors outside of defense can influence ERA, and that FIP does not perfectly match what a pitcher’s ERA would be with an average defense, but these anomalies will be canceled out in a large enough data set.

For the data, I measured playoff-contending teams (at least 85 wins) since 2002 (the furthest back I could get a value for a defensive rating) through 2015. From these teams, I parsed values for ERA, FIP, and defense, as well as the team’s payroll, runs scored, runs allowed, and run differential.

While taking my initial walks through the data, I saw two types of teams on this list. There were teams that scored few runs, but allowed even fewer, and there were teams that scored a host of runs, although they conceded a large, but lesser amount. The teams that scored little and allowed less had a common trend: they had great defenses and ERAs generally lower than FIPs. On the other hand, the teams that blasted the seams off the ball and had no problems putting runs on the scoreboard tended to have poor defenses, and their FIP-ERA difference was negative.

Using this data, I decided to run a regression analysis between a team’s defense and this FIP-ERA difference. There was a solid relationship between these two variables, with an r-squared of 0.48. This indicates that the difference between a team’s FIP-ERA difference tends to increase as the skill level of their defense increases.

fiperatodef

Now we know correlation does not imply causation, but this relationship indicates the strength within this relationship. The better a team’s defense is, the more likely their defense will be able to positively influence their pitching staff’s performance. These were teams like the 2002 Atlanta Braves, the 2011 Tampa Bay Rays, or the 2004 and 2005 St. Louis Cardinals. These teams didn’t have great offenses, but they had great defenses, they had good team ERAs, and they prevented teams from scoring runs.

On the other hand, there were teams like the 2003 and 2004 Red Sox as well as the Mid-2000s Yankees. These teams were those with massive payrolls that paid a premium for a punishing lineup. These lineups, however, lacked defensive talent, causing their pitching staffs to underperform their expected performances, as their teams’ ERAs were higher than FIPs.

So how related is this FIP-ERA difference to the amount of runs allowed? Well, pretty strong, with an r-squared of 0.46. Again, a strong relationship, this time negative, indicating that as a team’s FIP-ERA increases, the runs that team allows decrease.

fiperadiftora

To reinforce this relationship, I looked at defense and runs allowed. Again, this relationship showed a good, not great relationship, with an r-squared at 0.28.

ratodef

From these relationships, we can deduce that as a team’s defense rises in skill, the runs they allow tend to decrease and their team FIP-ERA difference tends to increase. Similarly, as a team’s FIP-ERA increases, the amount of runs a team allows decreases. From these relationships, we can conclude that these three variables are related.

As a team’s defense increases, they can positively influence the effectiveness of their pitching staff and will decrease their runs allowed. This may seem like common sense, and it probably is.

Now when we look at Bill James’ Pythagorean Win Expectation and other similar formulae, we notice that a team’s expected winning percentage is not dependent on the runs they score, but rather, their run differential. So yes, if you want to, you can construct a team like the Bronx Bombers and spend millions to assemble the some of the best lineups of recent history. If you’ll do that, you’ll hit score a host of runs, and with decent pitching and decent fielding (or below-average defense and good pitching — like those mid-2000s Yankees teams), you’ll be able to outscore your opponents and have a high run differential.

Or, you can assemble a team that will limit the amount of runs you’ll give up, by investing in defense. You will be able to compensate for average hitting and pitching, as you will boost your pitching staff’s effectiveness, and you will reduce the need for your offense to put up great numbers. Again, we have seen teams like this. The 2002 Braves were a combination of good defense, great pitching (aided by that defense), and average or perhaps even below-average offense; yet, this team won 101 games by scoring a mediocre 702 runs on the season (the average for the NL was 720 that season, 747 for all of baseball). Similarly, the 2011 Tampa Bay Rays put up 707 runs, against an American League average of 723, and still put up 91 wins and made the playoffs with good pitching and better defense. In fact, FIP would indicate their pitching was expected to perform right at American League average, a 4.08 ERA, yet they posted a 3.58 ERA.

Moreover, in that same season, the Los Angeles Angels won 86 games on just 667 runs, as they had even better pitching than the Rays. FIP would indicate the Angels’ pitching would be around a 3.94 ERA with league-average defense, but it was at a 3.57 ERA. The impact of good pitching paired with defense clearly is high, and I can’t think of one better, final example than the 2010 World Series-winning San Francisco Giants, who couldn’t have reiterated this structure any better: great pitching, great defense, and below-average offense.

So when one is trying to construct a team, and, unlike with the Yankees or Red Sox, money is a constraint, one might want to consider investing in defense. I say this because I looked directly at the relationship of a team’s payroll and their defensive ability, and it actually produced a negative relationship.

salarytodef

I know this data may be influenced by the fact that salaries have increased essentially every year in the span between 2002-2015, but if this truly did influence the graph, it would show either two things. Teams recently may have lessened their focus on defense and spent on hitting and pitching (explaining why defense-oriented teams had smaller payrolls); or, even with the rising caps, teams have still been able to assemble winning rosters by focusing on defense. Whether it is the first condition or the second, or perhaps a combination of both, perhaps defense is undervalued in today’s MLB. I doubt I’m the first to figure this one out, but the Cubs have far and away the best defense in baseball. Also, the Red Sox and Indians have stellar gloves as well, forming a solid second-tier level of defense that has put them in playoff position. So maybe Jason Heyward’s contract shouldn’t look so bad after all.

You don’t have to score a ton of runs to be a playoff baseball team. You just have to score more than the other team does, which can be done through limiting the amount of runs they score. It may seem like common sense, but common sense eludes us all at times.

There are many ways to construct a baseball team, and this might be just one more. And for stingy owners, it wouldn’t break the bank.


Has Tyler Flowers Finally Blossomed?

As expected, it was mostly a miserable season for the rebuilding Atlanta Braves. The team struggled mightily, especially on offense. The Braves scored the second-fewest runs in baseball. They owned an 86 wRC+, third-lowest in the MLB. In fact, they only had two hitters with a wRC+ of 100 or higher. The first is unsurprisingly Freddie Freeman, who sat at a sterling 153 wRC+. In second, there is a modest surprise: it’s Tyler Flowers, who sat at a 111 wRC+.

Rebuilding teams generally strut out their top prospects regularly, but they also play high-upside guys they signed off of the scrap heap. Flowers fits the latter description. Although he was drafted in the 33rd round, he raked in his first three professional seasons:

Season Team G PA HR SB BB% K% ISO BABIP AVG OBP SLG wOBA wRC+
2006 Braves(R) 34 150 5 0 10.70% 20.00% 0.186 0.326 0.279 0.373 0.465 0.389 135
2007 Braves(A) 106 445 12 3 11.00% 16.60% 0.190 0.339 0.298 0.378 0.488 0.387 133
2008 Braves(A+) 122 520 17 8 18.80% 19.60% 0.206 0.342 0.288 0.427 0.494 0.415 154

His first full season in 2007 produced an awesome 133 wRC+ and led to Flowers’ first prospect ranking. Baseball America named him the Braves 12th-best prospect after that year. He didn’t get another chance to be ranked in the Braves system after 2008 though, because he was traded right after the season ended. He headlined a package of prospects that went to the White Sox for Javier Vazquez and reliever Boone Logan. Vazquez went on to pitch 219.1 innings with a 2.87 ERA that season for the Braves, and Boone Logan would go on to become a pretty solid lefty specialist (although he wasn’t effective for the Braves).

The other prospects in the deal (Jonathan Gilmore, Brent Lillibridge, and Santos Rodriguez) were not as highly regarded as Flowers. Soon after the deal was completed, the post-2008 season prospect rankings were released by Baseball America. Flowers was ranked the fourth-best prospect in the White Sox system and the 99th-best prospect in the majors. Lillibridge came in at eighth in the organization, Rodriguez came in at 18th, and Gilmore came in at 21st.

The other prospects would go on to become non-factors. Gilmore and Santos have never reached the majors. Lillibridge has a 60 wRC+ in 784 MLB PAs and a negative defensive value, netting him a career WAR of -1.7.

Meanwhile, Flowers steadily climbed up the organizational ladder. His first season with the White Sox was great in Double and Triple-A:

Season Team G PA HR SB BB% K% ISO BABIP AVG OBP SLG wOBA wRC+
2009 White Sox (AA) 77 317 13 3 18.00% 24.00% 0.246 0.383 0.302 0.445 0.548 0.444 177
2009 White Sox (AAA) 31 119 2 0 8.40% 26.90% 0.152 0.394 0.286 0.364 0.438 0.363 126

That year, he even earned a September call-up. After the season, BA ranked him as the White Sox No. 2 prospect and 60th overall, and FanGraphs ranked him as the White Sox’s best. Unfortunately, in his next season, Flowers only managed a 108 wRC+ in Triple-A in 412 PAs. His strikeout rate escalated to 29.4%. Still only 25, he improved in his next season, garnering a 148 wRC+ in 270 Triple-A PAs (although his strikeout rate was a staggering 31.1%). This warranted Flowers’ first extended look in the majors. He was given over 100 PAs in each of the next five seasons, but he could never quite reach his potential. He showed power at times, with a .199 ISO in 282 PAs in his first two years, boosted by 12 homers. However, with a walk rate below 6% in the next three seasons, coupled with a K-rate of over 30% in two of those three seasons, Flowers could never get on base at a solid clip. To make matters worse, his power bottomed out. His ISO shrank to .118 last year in 361 PAs. Here are his offensive numbers on the White Sox overall:

PA H 2B 3B HR R RBI SB CS BB% K% ISO BABIP AVG OBP SLG wOBA wRC+
1360 279 50 2 46 119 142 2 5 6.30% 33.20% 0.155 0.311 0.225 0.288 0.380 0.295 84

So, despite tallying 27.3 defensive runs above average (according to FanGraphs) in his first five seasons, the White Sox non-tendered Flowers after 2015 because of his poor offensive output. The Braves (again!) scooped him up for a mere $5.3 million guaranteed over two years. That gamble seems to have paid off, because Flowers had his best offensive season in the majors this year. In 325 PAs, his walk rate is back up to 9%, above league average and his second-best in a season. His strikeout rate is down to its lowest ever, at 28%. His ISO, though still below league average, is up 33 points. His BABIP has skyrocketed, at .364, the highest of his career. All of this has led to a .270/.357/.420 triple slash, with a .338 wOBA and a 110 wRC+. What’s going on? Had Flowers made any changes? Is he finally going to reach his potential? Let’s find out.

First, let’s take a look at Flowers’ plate discipline. His O-Swing%, at 27.2%, is his lowest since 2011. That puts him in a tie for 79th-lowest out of the 266 hitters with at least 300 PAs this year. His below-average O-Swing% paired nicely with an above-average Z-Swing% (67.9%). He has the 63rd (out of the 266 hitters) best differential in those two categories (O-Swing minus Z-Swing). Basically, Flowers has been laying off of balls and swinging at strikes.

Possibly because he was swinging at better pitches, Flowers made much more contact. His swinging-strike rate (percentage of swings and misses against all pitches he has seen) dropped to 11.6%, easily the lowest of his career. His contact rate (percentage of contact against all swings) rose to 74.6%, a career best as well. While these two marks are still below average, they represent a significant improvement for Flowers.

Better selection seems to have led to better contact quality for Flowers. This year, he posted easily the lowest Soft% (13%) and highest Hard% (44.3%) contact percentages of his career. Using the sample of 266 hitters from earlier, Flowers tied for the 17th-lowest Soft%, and he had the fourth-highest (!) Hard% (just above teammate Freddie Freeman!). Statcast agrees wholeheartedly that Flowers improved his contact quality. He had the fifth-highest (!) average exit velocity among the 272 hitters with at least 170 batted-ball events this season. He added 3.2 MPH to his average exit velocity since last year. Statcast also says that Flowers tied for the fifth-highest (!) estimated swing speed out of the 294 hitters with at least 150 batted-ball events this year. In addition, he also dropped his popup rate (IFFB%) by more than 50% from last year. Lastly, his Pull% dropped a ton this year. He tied for the 38th-lowest Pull% among the sample of 266 hitters from earlier. Since he doesn’t pull many grounders, it’s harder to shift on him. Therefore, he’ll get more base hits on grounders. These improvements make it look like Flowers can maintain a high BABIP.

While these are all good developments, part of his improving plate discipline may just be because Flowers saw his lowest percentage of pitches in the zone since 2011 (45.8%), so it was easier for him to take more walks. In addition, many of these improvements are so much better than anything Flowers has ever done in the majors, so I’m guessing some regression is in order, especially in these areas:

O-Swing% Contact% SwStr% Soft% Hard% IFFB% Pull%
2016 27.2% 74.6% 11.6% 13.0% 44.3% 5.3% 34.9%
Career 31.0% 69.2% 15.1% 18.6% 33.1% 10.5% 41.8%

Another knock on Flowers: generally, exit velocity leads to more power, but most of the good numbers for Flowers there have come from his exit velocity on grounders, which won’t lead to more power. He had the third-highest average exit velo on grounders, but only the 26th-highest on fly balls plus line drives. However, 26th out of 272 is still good.

Despite the high average exit velocity, Flowers had the 19th-highest rate out of 272 in terms of barrel hits/batted-ball events (which is still good, but not quite as good as the other exit-velo leaders). This is another reason why Flowers may have a lower-than-expected power output.

Overall, there were definitely some encouraging signs from Flowers this year. He was more disciplined and he made more and better contact. His power should improve if he keeps hitting the ball hard and swinging at good pitches. In addition, although he had a negative Defensive Runs Added this year for the first time, his framing has improved tremendously in the last couple of years. He saved over 13 runs this year (fourth-best in the majors) after saving over 22 last year (second-best).

Flowers’ success in the minors supports his success this year somewhat, but then again, this is his first above-average offensive season in the majors (in six tries), and he’s not getting any younger (he’s 30). Furthermore, since BABIP is volatile, even for hitters with great contact quality like Flowers, it will be hard for him to be consistently good, unless his power improves (which it probably should) and he maintains his strides in plate discipline. He’ll probably be given enough at-bats for us to find out, given the Braves’ level of terribleness and his defensive prowess.

Data is from FanGraphs, Baseball America, StatCorner, and Baseball Savant.

Thanks for reading!