The Giants Don’t Need an Overhaul, But an Upgrade

October 22, 2016

The Giants started off their 2016 campaign with a 57-33 record before the All-star break, before finishing 87-75. There were plenty of downfalls in the second half of the season, but ultimately the bullpen led the Giants to their fate.

In the first half of the season the combined ERA of the bullpen was 2.27, with 26 saves and a K/9 of 9.7. This being said, they had 42 save opportunities, which means they blew a save 38% of the time. In the second half of the season they combined for a 2.85 ERA, with 17 saves and a K/9 of 8.4. They blew 13 saves in 30 opportunities during the second half, which means they blew a save 43% of the time.

The bullpen was heavily criticized in the second half of the season due to the team’s inability to replicate the same win rate they saw in the first half. However, the bullpen was only slightly better in the first half then it was in the second half.

To me, the Giants were in dire need of acquiring a threat in the bullpen before the trade deadline approached. They went after Will Smith, who came in to the Giants’ pen with a 2.12 ERA, 7.9 K/9 and three blown save opportunities. With the Giants he had an ERA of 2.94, a 12.8 K/9 and a blown save. He was not able to convert a save all season, and although he proved to be a nice piece in the bullpen in hold situations, he was not a guy who could come into the 9th inning and dominate the game.

In the postseason the Giants were 0/2 in save situations and, in their final game against the Cubs, their bullpen collapse was maybe the worst the league has ever seen in the playoffs. However, their rookie Ty Blach came in for 3.2 innings of relief during the postseason and did not allow an earned run. He looked promising at the end of the regular season and pitched well in high-pressure situations during October baseball. It was surprising to see him and Santiago Casilla sit out their final game, as they watched their bullpen drop four runs in the 9th. Furthermore, we saw Clayton Kershaw close the Dodgers’ final game against the Nationals to move on to the NLCS. It would have been interesting to see what kind of performance Madison Bumgarner could have shown the Cubs’ batters in that final inning.

Finally, with the veteran relievers of Javier Lopez, Sergio Romo and Casilla needing new contracts for the 2017 campaign, and the Giants in need of finding someone who can come into a 9th inning and pose a legitimate threat, it will be interesting to see what the team does in the offseason to improve their bullpen. Here are my top five predictions for the Giants’ next closer.

#1: Kenley Jansen:

It is unlikely that Aroldis Chapman will be looking for a new home this offseason, as he looks comfortable in Chicago and will have a hard time finding a team with that amount of talent. Jansen, however, may flee from the aging Dodgers, especially if someone is willing to pay. The Giants will have a bit of salary space to work with and would benefit greatly from this signing.

#2: Mark Melancon:

Although Melancon is a few steps below the elite Jansen and Chapman, he showed he can work a 9th inning as well as anyone this season. He may be a bit more team-friendly as far as salary space, and that may be intriguing to the Giants who will be looking to add a heavy-hitting left fielder.

#3: Jonathan Papelbon:

Papelbon was replaced by Melancon for the Nationals’ closing position in the second half of the 2016 season. He had a great first half, and showed he is capable of being a dominant closer in the MLB. However, his fight with Bryce Harper in 2015 and his rough second half of the season may make him a risky candidate. This may lower his cost and if the Giants are unable to sign Jansen or Melancon, they would be smart to see what Papelbon could do for their bullpen.

#4: Derek Law:

Derek Law debuted in 2016 and had a pretty good campaign. With a 2.13 ERA in 55 innings of relief, he may have a shot at being the Giants’ closer. However, it would be unlikely for him to start the 2017 season off as the Giants’ closer, unless they are unable to sign someone to fill that duty this offseason. He is an unlikely candidate, but if he can improve from his 2016 season, there is no reason he would not be able to become a legitimate MLB closer.

#5 Aroldis Chapman:

Chapman will likely return to the Cubs, especially if they make it to the World Series this October. However, he has been on three teams in the past two years, and if the Giants are able to show him more money than the Cubs, they might be able to acquire the hard-throwing lefty. If they do, they might lose the power they need to fill left field but they would come into the 2017 season looking stronger than they did a season ago.

The Non-Decline and Fall of the San Francisco Giants

by 1908

October 21, 2016

The Chicago Cubs, hinting that this year they may have magick stronger than The Goat, recently brought the San Francisco Giants’ even-year playoff dominance to an end. It was an offensively offensive series; add the two teams’ OPS together and you’re just 100 points better than David Ortiz. The low-velocity Giants staff struck out a batter an inning, and both lineups walked at a lower rate than the unwalkable Royals. My working theory was that this series represented the final demise of the already waning power of the current edition of the Giants, and that the next chart-topping version of Big Head Bruce and the Monsters would have mostly new musicians. Turns out that this theory is only partially correct.

Your 2016 San Francisco Baseball Giants were actually a little better than the world-beating 2014 squad, at least when resort is had to statistics:

Stat 2016 (MLB rank) 2014 (MLB rank)

Position Player fWAR 26.7 (4) 23.0 (9)

SP fWAR 15.0 (5) 10.1 (21)

RP fWAR 2.1 (22) 1.4 (24)

Position Player wRC+ 98 (t12) 99 (9)

SP FIP- 96 (t7) 104 (19)

RP FIP- 97 (20) 98 (18)

Run differential/game +0.51 +0.31

Let’s pause a minute to consider the bullpen numbers, which are the very essence of “meh” both years. The Giants have had the reputation of having a good, cheap bullpen. It’s certainly cheap: Sergio Romo is the plutocrat of the unit at a relatively unimposing $9 million. But “good” is more of a stretch; the Giants relievers have delivered value pretty much consistent with what they’ve been paid.

Some commentators have carpeted Bochy for his bullpen usage during the NLDS, but (perhaps because I’m not actually a Giants fan) I take a longer view. The miscellaneous roadies Big Head Bruce has had to work with will hardly make anyone forget The Nasty Boys, but he has often been able to squeeze value out of them when it’s mattered most. In order to maximize value out of this motley crue (I’m in town all week — try the garlic fries) Bochy has had to be very active in the late innings, and the more decisions any manager has to make, the more that will go wrong.

Giants general manager Brian Sabean has correctly recognized that in Bruce Bochy he employs one of the best tacticians in the game today. Sabean has maximized the value of this skill by handing Bochy a collection of misfit bullpen toys and saying “here, you figure this out.” On most nights Bochy does, but every once in a while he fails, as happened in the star-crossed six-pitcher 9th in Game 4. If you want to see what a bullpen meltdown looks like in graphic form, here it is. (Younger or more sensitive Giants fans are advised not to click on that link.)

My guess is that Bochy has had a few other bad bullpen nights, but most of those have happened when the East Coast was already asleep. When you happen to have a bad night nationwide, people may be a little too inclined to draw definitive conclusions. (I do not cut Buck Showalter this kind of slack. Bochy has a bunch of semi-interchangeable parts that present numerous non-obvious choices. Buck doesn’t.)

But back to our regularly scheduled program: the 2016 Giants were, by most measures, a better squad than the 2014 one. This is a roster that’s peaking, and perhaps fell victim to what will soon be a storied Cubs team, or (more prosaically) to the bad luck inherently possible in a short series. So the Giants can look forward to an extended run of playoff contention!

Or not. The Giants are heading in full sail toward the dragon-pocked part of the map. This an old team — the Giants have the sixth-oldest set of position players in the majors and the oldest pitching staff. They have just two regular players under 27, Madison Bumgarner (still just 26) and Joe Panik (25). To borrow a Casey Stengel line, in 15 years Bumgarner may be in the Hall of Fame. In 15 years, Joe Panik will be 40.

The Giants’ farm will provide little aid. Their system has just two MLB top-100 prospects, with the best being the positionless Christian Arroyo at #79 (though the excellent Bernie Pleskoff is less hostile to his defense than I am). Austin Slater isn’t in the top 100, but he raked at AAA at age 23 with good plate discipline, so he may be able to fill the outfield spot Angel Pagan is likely to vacate.

On the bright side, the contracts of Jake Peavy and Pagan expire this year, taking $26 million off the books. Romo and Santiago Casilla will be departing for broadcasting careers as well, taking $15 million more of liabilities with them. The Giants need one or two outfielders and starting pitching, but especially with respect to the latter, next year’s free-agent class would make a cow laugh. The 2018 list is a better one, but between now and both free-agent classes likely interposes a new collective bargaining agreement, so there’s enough fog to compel Sabean to operate his lights on low beam.

And the competition isn’t sitting still. Regardless of how the hated Los Angeles Dodgers fare in the NLCS, they are poised to compete for a while. The Rockies have an exciting core of young talent, even if casual Rox fans despair of the team at the moment. The Outlaw A.J. Preller merits a blog post all his own (say, there’s an idea!), and while the Padres seem to have a bit of transmission loss between talent and wins, some improvement there is possible as well, especially if Tyson Ross can make a successful return from thoracic outlet surgery. (What? You say there’s another team in the NL West? Hmm … I’ll research that and get back to you.)

So the Giants may be stalling or even slipping backward in a division where at least two of the teams are making progress. The Giants have a good but mostly older core which could use the kind of help that free agency and prospect trades are unlikely to provide in 2017. So 2016 may indeed be the last gasp of this once-in-a-while mighty franchise, at least for the moment. Sabean has pulled a whole warren of rabbits out of his hat during his long tenure, but in 2017 he’s going to have to dig deep.

Perhaps there will be a powerful goat looking for work …

Dr. Hendricks and Mr. Gray

by Juan Pablo Zubillaga

October 19, 2016

Randomness and circumstances are important driving forces in everything that happens in the world. Although they usually work hand in hand with our own actions and decisions, they have the ability to pick you up when you hit the jackpot at the casino, or throw you down when your car gets crushed by a falling tree (hopefully you’re comfortably sleeping in your bed when that happens). They can also be the difference between a pitcher having an average season on the mound, and having an outstanding one. Such is the case with the seasons Jon Gray and Kyle Hendricks had this year.

I’m not going to make the argument that these two pitchers performed equally well this season, with the main differences being random chance and circumstances, because they didn’t. Hendricks was the better pitcher; it just wasn’t the 2.48-run difference their ERAs show. The similarities between the two performances can be summarized in basically two stats. If we take a look at xFIP and SIERA (two important ERA estimators available here at FanGraphs), Hendricks’ numbers of 3.59 and 3.70, respectively, are eerily similar to Gray’s 3.61 and 3.72. From there on, however, the numbers separate abruptly.

Much like Dr. Jekyll and Mr. Hyde represent the good and the bad within a person, Hendricks’ and Gray’s seasons represent two sides of the same coin. On the one hand, circumstantial factors and good fortune turned Hendricks’ very good performance into a historical season, while a different set of circumstances and some bad fortune turned Gray’s good performance into merely an average one. In this piece, we’ll take a look at the factors that influenced these diametrically opposed results.

I’ll start by saying that Kyle Hendricks had a remarkable and impressive season. He had an average strikeout rate (8.05 K/9), didn’t walk many batters (2.08 BB/9), and allowed very few longballs (0.71 HR/9), which resulted in a really good 3.20 FIP, which ranked 4^th in the majors. His ERA, however, ended up all the way down to 2.13; a whopping 1.07 runs less than his FIP. Despite being a big difference, it’s not all that uncommon, as nearly 2% of individual seasons by starters in the history of the game have had an E-F (ERA minus FIP) of -1.07 or lower. Nonetheless, that difference is hardly sustainable through multiple seasons. In major-league history, out of 2259 pitchers with at least 500 innings pitched, only two had a career E-F below -1.00, and both of them were full-time relievers (in case you’re curious, they are Alan Mills and Al Levine).

On the other side of the spectrum, Jon Gray also had a very solid season. He had an outstanding 9.91 strikeouts per 9 innings (that ranked him 9^th among qualifying starters), an average walk rate of 3.16 BB/9, and a solid home-run rate (0.94 HR/9), lower than league average despite pitching half of his innings at Coors Field. His performance was good enough for a 3.60 FIP, but his actual ERA rocketed to 4.61. This 1.01 positive difference is just as unusual as Hendricks’ negative one, as about 2% of individual seasons throughout history have resulted in differences of 1.01 or higher. For visualizing purposes, here’s a table summarizing both pitchers’ numbers.

So the question still remains: what were the determining factors in these two pitchers having such a massive difference in results? Let’s dive right into it.

First of all, I decided to look at the correlation factors between E-F and a wide array of pitching stats, using data from every pitcher in MLB history with 500+ innings. As a general rule of thumb, a correlation factor between 0.40 and 0.69 indicates a strong relationship between the two variables. The following table shows the stats that had at least a 0.40 correlation factor with E-F:

Welp, that’s a pretty lame table. Keep in mind, I analyzed correlations for stats as varied as pitch-type percentages, pitch-type vertical and horizontal movements, and Soft, Medium, and Hard-hit rates, as well as K, BB, and HR per 9, or HR/FB%. None of those had even a moderate relationship with E-F. So let’s stick with the stats presented on the table.

The first two stats are really no surprise. FIP basically assumes league-average BABIP and LOB% to estimate what a pitcher’s ERA should look like. So, if a pitcher has a high BABIP, FIP is going to estimate a lower ERA than the actual one, resulting in a higher E-F; thus the positive correlation. On the other hand, if a pitcher has a higher LOB%, he’ll allow fewer runs than his FIP would suggest, resulting in a lower E-F. This explains the negative correlation shown in the table. The last stat, however, came as a real surprise, at least for me. ERA seems to be positively correlated with E-F, which means that pitchers with higher ERA tend to have higher E-F than pitchers with lower ERA.

The next logical step would be to determine which factors, if any, explain BABIP and/or LOB% among pitchers. Using the same pitching stats than in the previous step, I ran correlations with BABIP and LOB% separately. The following table shows the stats that had a strong (0.40 to 0.69) or moderate (0.30 to 0.39) relationship.

As was the case in the first table, both of these stats are correlated strongly with E-F, showing factors of 0.58 and -0.42, respectively. It doesn’t come as a shock either, that they are strongly correlated with each other. The negative correlating factor (-0.42) indicates, as you would expect, that a high BABIP leads to a low LOB%, and vice versa. On the BABIP side, a positive strong relationship with ERA is almost too obvious, as more balls in play falling for hits leads to more runs being scored. Also, since fly balls in play (not counting home runs) turn more often into outs than ground balls do, it makes sense that BABIP holds a negative relationship with the former, and a positive one with the latter. This fact, however, goes against a somewhat popular belief that ground-ball pitchers tend to have lower BABIPs.

The factors that correlate to LOB% are more interesting. The first one is not unexpected: a higher strikeout rate seems to lead to more runners getting stranded, and that’s a pretty easy concept to wrap your head around. The second one, however, is really mind-boggling, and I really can’t say I can find a reasonable explanation for it. It indicates that the higher the home-run rate allowed by a pitcher, the more runners are going to be left on base. It is quite possible that this is just a spurious correlation, having no causality at all. Finally, the last factor listed on the table is very interesting and useful in this particular case. It suggests that high percentages of soft contact lead to higher LOB%. We’ll get to that later on in this article.

So let’s go back to our pitchers and check if any of this makes sense. We know that E-F is mainly affected by BABIP and LOB%. Hendricks and Gray had very different numbers in these two stats. The Cubs’ righty had a .250 BABIP and a LOB% of 81.5, while the Rockies’ fireballer had .308 and 66.4%. Considering that the league averages were .298 and 72.9%, respectively, we can say that Hendricks did considerably better than average, while Gray did just the opposite. So far so good, right? These facts go a long way towards explaining the differing outcomes. However, BABIP and LOB% aren’t exactly pitcher-dependent; in fact, they’re the marquee stats for the generic term “luck.”

Looking at the stats from the second table, few of them help out in figuring this out. High strikeout rates, for example, are supposed to increase LOB%, but Gray still managed a really low 66.4% despite a 9.91 K/9. On the other hand, Hendricks’ 81.5% LOB ranked 5^th among qualified starters, even though his strikeout rate of 8.05 was right around league average. Similarly, groundball percentage is shown to have a positive correlation with BABIP. Nonetheless, Hendricks’ higher-than-average rate of 48.4% (league average was 44.7%) resulted in a ridiculously low BABIP of .250, while Gray’s below-average rate of 43.5% came with a .308 BABIP. Almost the same thing happens when you look at the fly-ball rates.

The only factor from that second table that does make sense in these particular examples is soft-contact rate. Hendricks ranked 1^st in this regard among qualified starters, with an impressive 25.1% (league average was 18.8%), while Gray had a below-average rate of 17.8%, which ranked him 50^th out of 73 qualified starters. This stat is very much pitcher-dependent, and it does help explain some of the differences in LOB%. It has, however, a moderate relationship with LOB%, as evidenced by its factor of -0.37. Is that enough to account for the massive difference in the results? Intuitively, I’ll say no. There is one more factor, however, that we haven’t even discussed yet.

FIP stands for Fielding Independent Pitching, so the very thing that FIP is trying to subtract from the equation might hold the key to answering our question. Defensive performances can heavily influence the outcome of the game, and make up a big chunk of what we generally call “luck” in a pitcher’s final results. In order to have a numerical confirmation of this idea, I looked at the correlations between teams’ yearly defensive component of WAR and its staff’s BABIP, LOB%, and E-F. The data I used for this exercise was every individual team season from 1989 (the first year in which play-by-play data contained information on hits and outs location) to 2016.

We can see here that a team’s defense has a strong correlation with all three of the stats, especially E-F. Higher values of the defensive component of WAR lead to lower BABIP, higher LOB%, and lower E-F, just as you would expect.

Saying that the Cubs had a great defensive performance this year is an understatement. Not only was it the best defense in 2016 by a bunch — it was also the best defense of the last 17 years, according to FanGraphs’ defensive component of WAR. Of the 814 individual team seasons played in MLB since 1989, this year’s Cubs rank 8^th. That’ll put a serious dent on opponents’ BABIP. In fact, the Cubs’ average on balls in play of .255 (yes, that is the whole pitching staff’s BABIP) is the absolute lowest since the ’82 Padres. Oh, and also the Cubs pitching staff’s LOB% of 77.5% is tied for 2^nd highest since 1989. All of this adds up to a team E-F of -0.62. Wow. Just wow.

The Rockies defense, on the other hand, wasn’t bad, but it also wasn’t great. According to FanGraphs, it was 17.9 runs above average, which ranked 12^th in MLB. Again, that’s really not bad at all, just miles away from the 115.5 runs above average the Cubs had. The Rockies’ staff as a whole had a .317 BABIP, and a 68.0% LOB%; not unexpected from a team that plays half their games at altitude. Still, both of these values are worse than league average, resulting in a team E-F of 0.54.

All in all, Kyle Hendricks still had a better season than Jon Gray, and people will remember the 2.13 ERA and not the 4.61. This analysis just puts it a little bit more in perspective, and helps shed some light on the little details that make big differences in the course of a long season.

The old football adage says that “defense wins championships.” That doesn’t really apply to baseball, but in the future, when I think back to the 2016 Cubs, I’ll definitely think about their defense.

2016 ALCS Game One: Batter vs. Pitcher Stats

by Nick Rabasco

October 18, 2016

The FanGraphs Twitter page tweeted out a bingo card for Game One of the ALCS. As I looked through it, I thought it was a terrific idea by Michelle Jay and a fun way to follow the game that night. I was going to play along, but then I had another idea. Some slots were much more likely to happen, such as the “Pitcher v hitter stats are mentioned” slot. I figured I would let somebody else receive a t-shirt and just count up exactly how many times the TBS broadcast team mentioned batter vs. pitcher stats. We all know announcers love doing this, and we all know that it’s pretty useless for predicting the outcome of that particular at-bat. I just thought it would be cool to experiment and see how many times they actually mentioned these stats.

First, I’ll just go over the final numbers for batter vs. pitcher stats. There were 65 batters in this game, and batter vs. pitcher stats were either mentioned by the announcers or shown on a graphic for eight of those batters. There were two separate times where they showed a graphic and then mentioned the stats later in the plate appearance, or vice versa. Four of the eight instances occurred when the Jays were hitting against Corey Kluber, three of the eight came when Andrew Miller was pitching, and the last one came when Marco Estrada was on the mound. It’s interesting that they would mention those stats more often when a reliever is pitching, considering the sample size is sure to be even smaller against relievers, rather than starters.

For fun, I marked each occurrence and tried to quickly type out how the announcer mentioned these stats:

Top 1, Josh Donaldson vs. Corey Kluber: “He’s got some pretty good numbers, 6 for 16 with a jack, so he sees him well” -Cal Ripken
Top 1, Russell Martin vs. Corey Kluber: “Martin is only 2 for 10 in his career against Kluber, both home runs…in fact, two of his last seven off Kluber have been home runs” -Ernie Johnson (graphic added later in the plate appearance reading “2 for last 7 off Kluber with 2 HR”
Top 2, Michael Saunders vs. Corey Kluber: “Saunders steps in, he’s 3 for 8 in his career against Kluber, and he fouls it off” -Ernie Johnson
Top 6, Michael Saunders vs. Corey Kluber: “Saunders with his two hits, now 5 for 10 off Kluber” -Ron Darling
Bottom 6, Jason Kipnis vs. Marco Estrada: graphic shown reading “0 for 7 4 K VS ESTRADA”
Top 7, Melvin Upton Jr. vs. Andrew Miller: “Upton’s got some numbers against Miller, 5 for 12 with three home runs” -Ron Darling (“That is some numbers” -Cal Ripken)
Top 8, Edwin Encarnacion vs. Andrew Miller: “Encarnacion in his last six at-bats against Miller a couple of home runs and a double” -Ernie Johnson
Top 8, Jose Bautista vs. Andrew Miller: graphic shown reading “.286 (2 for 7) 1 HR 2 BB VS MILLER” (later in the plate appearance: “One of the two hits that Bautista has off Miller…long ball” -Ron Darling

I’m not trying to knock these announcers by saying that they’re not good at what they do or anything. I would be a terrible announcer. I just think these stats are pretty useless and it was interesting to see how many times they actually mentioned them during a game. Mike Petriello pointed out on Twitter an example of why these numbers aren’t good to look at.

This would be kind of fun to track during the regular season for the really good ones, such as “so and so: 1 for 2 (.500), single career vs. so and so.” Maybe this can be a new metric or something, bpBAAR (batter pitcher Baseball Announcer Above Replacement).

Clustering Pitchers With PITCHf/x

by jonyanks620

October 18, 2016

At any point, feel free to scroll down to the bottom to see some of the tables of pitcher clusters.

Clustering Pitches

Clustering individual pitches using data from PITCHf/x is a fairly simple task. All you need to do is pick out the important attributes that you believe define a pitch (velocity, movement, etc.) and use a clustering algorithm, such as K-Means clustering.

With K-Means clustering, you decide what K (the number of clusters) should be. For my analysis, I chose K to be 500 (rather arbitrarily). Different pitch clusters can represent the same type of pitch (i.e. fastball) but with varying attributes. For example, clusters 50 and 100 might both correspond to fastballs, but cluster 50 might be a typical Chris Young fastball whereas cluster 100 might be a typical Aroldis Chapman fastball.

One important point to remember is that you, the analyst, must decide what the clusters represent. By looking at attributes of the pitches in a given cluster, you might identity the cluster as “lefty changeups” or “submariner fastballs” (which is actually a category you will discover).

The Problem of Clustering Pitchers

We can identify every pitch that a pitcher throws as belonging to a cluster from 1 to 500. Therefore, we know the distribution of pitch clusters for a given pitcher. The difficult problem, however, is how do we compare two pitchers using this information? Let’s say we have two pitchers:

Pitcher A’s pitches are 50% from cluster 1 and 50% from cluster 200.
Pitcher B’s pitches are 33% from cluster 1, 33% from cluster 300, and 33% from cluster 139.

The question remains, are Pitcher A and Pitcher B similar pitchers?

The problem of clustering pitchers is a more complicated one than clustering pitches because we now have a collection of pitches instead of just individual pitches to compare. In order to cluster pitchers, I use a model that is typically used for topic modeling called Latent Dirichlet Allocation (LDA).

An Aside on LDA

In LDA for topic modeling, our data is a collection of documents.

Let’s imagine that our collection of documents is articles from the New York Times. There are global topics that govern how these articles are generated. For example, if you think of a newspaper, the topics might be sports, finance, health, politics, etc. Additionally, each article can be a mixture of these topics. We might imagine there is an article in the sports section titled, “Yankees payroll exceeds $300 million”, which our algorithm may discover is 50% about sports and 50% about finance.

Similar to what is mentioned above, the analyst must figure out what the topics actually are. You do not tell the algorithm that there is a sports topic. You discover that the topic is sports by observing that the most probable words are “baseball”, “Jeter”, “LeBron”, “touchdown”, etc. The algorithm will tell you that a particular document is 50% about topic 1 and 50% about topic 20, but you must ultimately infer what topics 1 and topics 20 are.

I am harping on this point mainly just to mention that there is no magic to these clustering algorithms. An algorithm can cluster data, but it cannot tell you what these clusters mean.

Relevance of LDA to Pitchers

Anyway, how can this model be used to analyze pitchers? We just need to use our imagination. Instead of a collection of documents, we now have a collection of pitcher seasons. Whereas each document is made up of a collection of words, each pitcher season is made up of a collection of pitches. We have already discretized each pitch using K-Means clustering in order to create our own “dictionary” of pitches. In our baseball model, we imagine that each pitcher is a mixture of repertoires, whereas in topic modeling, each document was a mixture of topics. We can then cluster pitchers together by figuring out who has the most similar repertoires.

Nitty Gritty Details

If you are not interested in getting into the nitty gritty details, feel free to skip ahead to the next section to just see the cluster groupings.

Data used is from 2007-2014.
The dictionary of pitches (500 clusters) was created by running K-Means using all of the pitches from 2014. The choice of 2014 is arbitrary, but I used just one year’s worth of data because I thought it might be a sufficient amount and it was much quicker to run K-Means.
The PITCHf/x attributes that were used to cluster pitches were start_speed, pfx_x/pfx_z (horizontal/vertical movement), px/pz (horizontal/vertical location), vx0/vz0 (components of velocity).
For each pitcher from 2007-2014, each pitch was assigned to its closest cluster (determined by distance to the cluster center). I filtered out pitcher seasons in which the pitcher threw fewer than 500 pitches.
I then ran LDA on pitcher seasons, choosing the number of repertoires (topics) to be 5.
I used the method from this paper to get a vector representation of each pitcher season. I could have used the inferred repertoire proportions as my vector representations, but for various reasons, this did not produce as nice of clusters.
Finally, I ran K-Means (K=100) on these vectors to get clusters of pitchers.
Whereas in topic modeling, it is often interesting to interpret what the global topics actually are, I am not really interested in what the global “repertoires” are for the model. I am really using LDA as a dimensionality reduction technique to produce smaller vectors (5 vs. 500) that can be clustered together.

Some Observations

The actual clusters along with some relevant FanGraphs statistics are provided below. Each table is sortable. For brevity, I have only included clusters in which there are 10 or fewer pitchers. Only the first cluster shown (cluster 3) has more than 10 pitchers, which I simply included to demonstrate that a cluster could be quite big.

As is probably expected, clusters are almost always entirely righties or lefties even though this is not an input to the model.
Guys with similar numbers of batters faced cluster together. This is by design, as the way I determined the repertoire proportions accounts for the number of times a particular pitch is thrown.
Sometimes weird clusters can form, such as Cluster 37, which contains both Chapman and Wakefield. Cluster 37 is mostly cohesive with hard-throwing left-handers and I believe Wakefield ends up here simply because he did not fit well into any cluster.
This is not to say that the algorithm cannot find clusters of knuckleballers. Cluster 14 is all R.A. Dickey from years 2011-2014.
There are also other clusters that contain exclusively one (or almost one) pitcher. Cluster 8 is 5 Kershaw years and one Hamels year. Cluster 68 is 5 Verlander years. I believe these clusters form partially because their stuff is so good. There are other pitchers who fall into almost exclusively one cluster but who are joined by many other pitchers. Another factor is that they might be able to repeat their mechanics so well that they remain in the same cluster because they are always throwing the same pitch types.
Clusters of individual pitchers also happens if a pitcher has an incredibly unique style. Justin Masterson has his own cluster because he is such an extreme ground-ball pitcher. Josh Collmenter does as well due to the extreme rise he generates on his “fastball”.
Cluster 29 contains just Kershaw’s 2014 season and J.A. Happ’s 2009 season. If you do a Ctrl-F for J.A. Happ, he finds himself in some pretty flattering clusters. This is especially interesting because from 2007-2014, he does not have particularly good seasons, but he has been quite good the last two years. This is not to suggest that these clusters can uncover hidden gems, but it’s not fully out of the realm of possibility.
Most clusters produce quite similar ground-ball percentages. One of the factors that goes into clustering pitches (and therefore pitchers) is horizontal and vertical movement, which play a huge factor in a pitcher’s ability to produce ground-balls.
Submarine pitchers always end up together. Check out Clusters 9, 60, and 92.

Overall, I think this is pretty interesting stuff. I was honestly surprised that the clusters turned out to be as cohesive as they were. Additionally, besides being a descriptive tool, I have to wonder whether this information can be used for predictive purposes. For example, we often talk about regression to the mean when discussing a player’s performance, whether it be a pitcher of a batter. It is possible that the appropriate mean for many pitchers is the cluster mean that they happen to fall into.

Cluster 3

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2009	Chris Carpenter	Cardinals	750	6.73	1.78	0.33	55.0	28.0	4.6	5.5
2010	Hiroki Kuroda	Dodgers	810	7.29	2.20	0.69	51.1	32.1	8.0	4.3
2010	Gavin Floyd	White Sox	798	7.25	2.79	0.67	49.9	32.1	7.6	4.1
2008	Hiroki Kuroda	Dodgers	776	5.69	2.06	0.64	51.3	28.6	7.6	3.6
2012	Doug Fister	Tigers	673	7.63	2.06	0.84	51.0	26.7	11.6	3.4
2011	Josh Beckett	Red Sox	767	8.16	2.42	0.98	40.1	42.2	9.6	3.3
2011	Michael Pineda	Mariners	696	9.11	2.89	0.95	36.3	44.8	9.0	3.2
2012	A.J. Burnett	Pirates	851	8.01	2.76	0.80	56.9	24.3	12.7	3.0
2013	Rick Porcello	Tigers	736	7.22	2.14	0.92	55.3	23.7	14.1	2.9
2008	Carlos Zambrano	Cubs	796	6.20	3.43	0.86	47.2	34.9	9.0	2.8
2013	Andrew Cashner	Padres	707	6.58	2.42	0.62	52.5	28.7	8.1	2.7
2012	Jeff Samardzija	Cubs	723	9.27	2.89	1.03	44.6	33.1	12.8	2.7
2010	Scott Baker	Twins	725	7.82	2.27	1.22	35.6	43.5	10.2	2.6
2014	Kyle Gibson	Twins	757	5.37	2.86	0.60	54.4	26.6	7.8	2.3
2012	Tim Hudson	Braves	749	5.13	2.41	0.60	55.5	25.2	8.3	2.1
2014	Henderson Alvarez	Marlins	772	5.34	1.59	0.67	53.8	24.3	9.5	2.1
2008	Todd Wellemeyer	Cardinals	807	6.29	2.91	1.17	39.3	39.8	10.6	2.0
2010	Rick Porcello	Tigers	700	4.65	2.10	1.00	50.3	32.1	9.9	1.7
2011	Luke Hochevar	Royals	835	5.82	2.82	1.05	49.8	32.2	11.5	1.7
2008	Jason Marquis	Cubs	738	4.90	3.77	0.81	47.6	32.5	8.3	1.7
2014	Charlie Morton	Pirates	666	7.21	3.26	0.51	55.7	22.8	8.8	1.6
2012	Luis Mendoza	Royals	709	5.64	3.20	0.81	52.1	27.1	10.6	1.5
2009	Aaron Cook	Rockies	675	4.44	2.68	1.08	56.5	24.7	14.2	1.4
2014	Doug Fister	Nationals	662	5.38	1.32	0.99	48.9	34.2	10.1	1.4
2010	Mitch Talbot	Indians	696	4.97	3.90	0.73	47.8	35.3	7.0	1.2
2008	Armando Galarraga	Tigers	746	6.35	3.07	1.41	43.5	39.7	13.0	1.2
2008	Carlos Silva	Mariners	689	4.05	1.88	1.17	44.0	33.3	10.4	1.2
2009	Ross Ohlendorf	Pirates	725	5.55	2.70	1.27	40.6	42.1	11.1	1.2
2008	Vicente Padilla	Rangers	757	6.68	3.42	1.37	42.7	38.1	12.5	1.1
2012	Luke Hochevar	Royals	800	6.99	2.96	1.31	43.3	35.0	13.5	1.1
2012	Derek Lowe	– – –	640	3.47	3.22	0.63	59.2	21.0	9.1	1.0
2013	Edinson Volquez	– – –	777	7.50	4.07	1.00	47.6	29.6	11.9	0.9
2011	Chris Volstad	Marlins	719	6.36	2.66	1.25	52.3	27.7	15.5	0.7
2010	Jeremy Bonderman	Tigers	754	5.89	3.16	1.32	44.7	39.2	11.4	0.7
2010	Brad Bergesen	Orioles	746	4.29	2.70	1.38	48.7	36.6	11.9	0.6
2014	Hector Noesi	– – –	733	6.42	2.92	1.46	38.0	40.6	12.7	0.3
2009	Armando Galarraga	Tigers	642	5.95	4.20	1.50	39.9	38.6	13.3	0.2
2008	Kyle Kendrick	Phillies	722	3.93	3.30	1.33	44.3	28.7	14.0	0.1
2014	Roberto Hernandez	– – –	722	5.74	3.99	1.04	49.7	29.9	12.2	0.0
2013	Lucas Harrell	Astros	707	5.21	5.15	1.17	51.5	27.4	14.3	-0.8

Cluster 5

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2010	Cliff Lee	– – –	843	7.84	0.76	0.68	41.9	40.4	6.3	7.0
2011	Cliff Lee	Phillies	920	9.21	1.62	0.70	46.3	32.4	9.0	6.8
2009	Jon Lester	Red Sox	843	9.96	2.83	0.89	47.7	34.5	10.6	5.3
2014	Jose Quintana	White Sox	830	8.00	2.34	0.45	44.7	33.2	5.1	5.1
2013	Derek Holland	Rangers	894	7.99	2.70	0.85	40.8	36.4	8.8	4.3
2012	Matt Moore	Rays	759	8.88	4.11	0.91	37.4	42.9	8.6	2.7
2013	Wade Miley	Diamondbacks	847	6.53	2.93	0.93	52.0	27.2	12.5	1.8

Cluster 6

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2007	CC Sabathia	Indians	975	7.80	1.38	0.75	45.0	36.6	7.8	6.4
2014	Jake McGee	Rays	274	11.36	2.02	0.25	38.0	42.9	2.9	2.6
2014	Tyler Matzek	Rockies	503	6.96	3.37	0.69	49.7	30.3	8.3	1.7
2013	J.A. Happ	Blue Jays	415	7.48	4.37	0.97	36.5	46.0	7.6	1.1
2010	J.A. Happ	– – –	374	7.21	4.84	0.82	39.0	43.4	7.4	1.0
2009	Sean West	Marlins	467	6.10	3.83	0.96	40.2	40.8	8.0	1.0
2009	Andrew Miller	Marlins	366	6.64	4.84	0.79	48.0	30.0	9.3	0.7
2012	Drew Pomeranz	Rockies	434	7.73	4.28	1.30	43.9	35.9	13.6	0.7
2013	Jake McGee	Rays	260	10.77	3.16	1.15	42.5	38.8	12.9	0.6
2008	Jo-Jo Reyes	Braves	512	6.21	4.14	1.43	48.5	31.8	15.5	0.2

Cluster 8

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2013	Clayton Kershaw	Dodgers	908	8.85	1.98	0.42	46.0	31.3	5.8	7.1
2011	Clayton Kershaw	Dodgers	912	9.57	2.08	0.58	43.2	38.6	6.7	7.1
2012	Clayton Kershaw	Dodgers	901	9.05	2.49	0.63	46.9	34.0	8.1	5.9
2010	Clayton Kershaw	Dodgers	848	9.34	3.57	0.57	40.1	42.1	5.8	4.7
2009	Clayton Kershaw	Dodgers	701	9.74	4.79	0.37	39.4	41.6	4.1	4.4
2010	Cole Hamels	Phillies	856	9.10	2.63	1.12	45.4	37.9	12.3	3.5

Cluster 9

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2009	Peter Moylan	Braves	309	7.52	4.32	0.00	62.4	19.5	0.0	1.4
2014	Joe Smith	Angels	285	8.20	1.81	0.48	59.1	25.9	8.0	1.0
2011	Joe Smith	Indians	267	6.04	2.82	0.13	56.6	23.5	2.2	1.0
2009	Brad Ziegler	Athletics	313	6.63	3.44	0.25	62.3	19.7	4.4	1.0
2013	Brad Ziegler	Diamondbacks	297	5.42	2.71	0.37	70.4	10.8	12.5	0.6
2012	Brad Ziegler	Diamondbacks	263	5.50	2.75	0.26	75.5	7.7	13.3	0.6
2012	Joe Smith	Indians	278	7.12	3.36	0.54	58.0	24.9	8.3	0.6
2008	Cla Meredith	Padres	302	6.27	3.07	0.77	66.8	17.3	15.8	0.3
2010	Peter Moylan	Braves	271	7.35	5.23	0.71	67.8	21.3	13.5	-0.3

Cluster 14

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2012	R.A. Dickey	Mets	927	8.86	2.08	0.92	46.1	34.1	11.3	5.0
2011	R.A. Dickey	Mets	876	5.78	2.33	0.78	50.8	32.9	8.3	2.5
2014	R.A. Dickey	Blue Jays	914	7.22	3.09	1.09	42.0	37.6	10.7	1.7
2013	R.A. Dickey	Blue Jays	943	7.09	2.84	1.40	40.3	40.5	12.7	1.7

Cluster 16

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2013	Max Scherzer	Tigers	836	10.08	2.35	0.76	36.3	44.6	7.6	6.1
2014	Max Scherzer	Tigers	904	10.29	2.57	0.74	36.7	41.6	7.5	5.2
2011	Daniel Hudson	Diamondbacks	921	6.85	2.03	0.69	41.7	39.1	6.4	4.6
2012	Max Scherzer	Tigers	787	11.08	2.88	1.10	36.5	41.5	11.6	4.4
2014	Jeff Samardzija	– – –	879	8.28	1.76	0.82	50.2	30.5	10.6	4.1
2014	Lance Lynn	Cardinals	866	8.00	3.18	0.57	44.3	36.0	6.1	3.4

Cluster 18

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2008	Brandon Webb	Diamondbacks	944	7.27	2.58	0.52	64.4	20.4	9.6	5.5
2013	Justin Masterson	Indians	803	9.09	3.54	0.61	58.0	24.2	10.7	3.5
2012	Justin Masterson	Indians	906	6.94	3.84	0.79	55.7	25.0	11.4	2.3
2011	Derek Lowe	Braves	830	6.59	3.37	0.67	59.0	22.5	10.2	2.1

Cluster 20

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2010	John Danks	White Sox	878	6.85	2.96	0.76	45.4	38.9	7.4	4.4
2010	Brian Matusz	Orioles	760	7.33	3.23	0.97	36.2	45.0	7.9	3.0
2009	John Danks	White Sox	839	6.69	3.28	1.26	44.2	40.9	11.5	2.7
2013	Felix Doubront	Red Sox	705	7.71	3.94	0.72	45.6	34.4	7.8	2.2
2014	J.A. Happ	Blue Jays	673	7.58	2.91	1.25	40.6	39.5	11.5	1.0

Cluster 24

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2008	CC Sabathia	– – –	1023	8.93	2.10	0.68	46.6	31.7	8.8	7.3
2011	CC Sabathia	Yankees	985	8.72	2.31	0.64	46.6	30.3	8.4	6.4
2010	David Price	Rays	861	8.11	3.41	0.65	43.7	39.6	6.5	4.2

Cluster 29

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Clayton Kershaw	Dodgers	749	10.85	1.41	0.41	51.8	29.2	6.6	7.6
2009	J.A. Happ	Phillies	685	6.45	3.04	1.08	38.4	42.9	9.5	1.7

Cluster 35

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Chris Young	Mariners	688	5.89	3.27	1.42	22.3	58.7	8.8	0.1
2014	Marco Estrada	Brewers	624	7.59	2.63	1.73	32.7	49.5	13.2	-0.1

Cluster 36

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2011	Justin Masterson	Indians	908	6.58	2.71	0.46	55.1	26.7	6.3	4.2
2010	Justin Masterson	Indians	802	7.00	3.65	0.70	59.9	24.9	10.0	2.3

Cluster 37

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2012	Aroldis Chapman	Reds	276	15.32	2.89	0.50	37.3	42.9	7.4	3.3
2009	Matt Thornton	White Sox	291	10.82	2.49	0.62	46.4	36.3	7.7	2.3
2008	Matt Thornton	White Sox	268	10.29	2.54	0.67	53.0	27.4	10.9	1.7
2012	Drew Smyly	Tigers	416	8.52	2.99	1.09	39.9	41.3	10.3	1.7
2008	Clayton Kershaw	Dodgers	470	8.36	4.35	0.92	48.0	31.3	11.6	1.5
2008	Tim Wakefield	Red Sox	754	5.82	2.98	1.24	35.5	48.9	9.1	1.1
2011	Tim Wakefield	Red Sox	677	5.41	2.73	1.45	38.4	45.8	10.5	0.2

Cluster 38

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2013	Cliff Lee	Phillies	876	8.97	1.29	0.89	44.3	33.3	10.9	5.5
2008	Johan Santana	Mets	964	7.91	2.42	0.88	41.2	36.4	9.4	5.3
2010	Jon Lester	Red Sox	861	9.74	3.59	0.61	53.6	29.6	8.9	4.8
2012	CC Sabathia	Yankees	833	8.87	1.98	0.99	48.2	30.7	12.5	4.7
2008	Jon Lester	Red Sox	874	6.50	2.82	0.60	47.5	31.6	7.0	4.1
2013	Hyun-Jin Ryu	Dodgers	783	7.22	2.30	0.70	50.6	30.5	8.7	3.6
2014	Wei-Yin Chen	Orioles	772	6.59	1.70	1.11	41.0	37.5	10.5	2.4
2010	Jonathan Sanchez	Giants	812	9.54	4.47	0.98	41.5	43.7	9.8	2.3
2014	Wade Miley	Diamondbacks	866	8.18	3.35	1.03	51.1	28.0	13.9	1.6

Cluster 44

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2011	Cole Hamels	Phillies	850	8.08	1.83	0.79	52.3	32.6	9.9	4.9
2008	Cole Hamels	Phillies	914	7.76	2.10	1.11	39.5	38.7	11.2	4.8
2008	John Danks	White Sox	804	7.34	2.63	0.69	42.8	35.4	7.4	4.8
2009	Cole Hamels	Phillies	814	7.81	2.00	1.12	40.4	38.7	10.7	3.9
2014	Danny Duffy	Royals	606	6.81	3.19	0.72	35.8	46.0	6.1	1.9
2011	J.A. Happ	Astros	698	7.71	4.78	1.21	33.0	44.2	10.2	0.6

Cluster 46

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2010	Roy Halladay	Phillies	993	7.86	1.08	0.86	51.2	29.7	11.3	6.1
2013	Lance Lynn	Cardinals	856	8.84	3.39	0.62	43.1	34.4	7.4	3.7
2008	Mike Pelfrey	Mets	851	4.93	2.87	0.54	49.6	29.6	6.3	3.1
2009	A.J. Burnett	Yankees	896	8.48	4.22	1.09	42.8	39.2	10.8	3.0
2010	Roberto Hernandez	Indians	880	5.31	3.08	0.73	55.6	30.8	8.3	2.6
2009	Derek Lowe	Braves	855	5.13	2.91	0.74	56.3	25.8	9.4	2.5
2010	Derek Lowe	Braves	824	6.32	2.83	0.84	58.8	22.6	13.1	2.2

Cluster 49

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Aroldis Chapman	Reds	202	17.67	4.00	0.17	43.5	34.8	4.2	2.8
2014	James Paxton	Mariners	303	7.18	3.53	0.36	54.8	22.6	6.4	1.2
2013	Rex Brothers	Rockies	281	10.16	4.81	0.67	48.8	32.5	9.3	0.9
2012	Antonio Bastardo	Phillies	224	14.02	4.50	1.21	27.7	50.0	12.5	0.8
2012	Tim Collins	Royals	295	12.01	4.39	1.03	40.9	42.8	11.8	0.7
2012	Christian Friedrich	Rockies	377	7.87	3.19	1.49	42.2	34.6	15.4	0.7
2013	Justin Wilson	Pirates	295	7.21	3.42	0.49	53.0	30.0	6.7	0.6
2011	Aroldis Chapman	Reds	207	12.78	7.38	0.36	52.7	30.8	7.1	0.5
2014	Justin Wilson	Pirates	256	9.15	4.50	0.60	51.3	34.4	7.3	0.2
2011	Mike Dunn	Marlins	267	9.71	4.43	1.29	38.5	46.0	12.2	-0.2

Cluster 51

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2009	Cliff Lee	– – –	969	7.03	1.67	0.66	41.3	36.5	6.5	6.3
2009	CC Sabathia	Yankees	938	7.71	2.62	0.70	42.9	37.3	7.4	5.9
2010	CC Sabathia	Yankees	970	7.46	2.80	0.76	50.7	34.1	8.6	5.1

Cluster 54

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Hisashi Iwakuma	Mariners	709	7.74	1.06	1.01	50.2	28.7	13.2	3.1
2009	Justin Masterson	– – –	568	8.28	4.18	0.84	53.6	31.4	10.4	1.5
2014	Justin Masterson	– – –	592	8.11	4.83	0.84	58.2	21.6	14.6	0.4

Cluster 58

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	David Price	– – –	1009	9.82	1.38	0.91	41.2	38.1	9.7	6.0
2014	Jon Lester	– – –	885	9.01	1.97	0.66	42.4	37.0	7.2	5.6
2012	Gio Gonzalez	Nationals	822	9.35	3.43	0.41	48.2	30.0	5.8	5.0
2011	David Price	Rays	918	8.75	2.53	0.88	44.3	36.9	9.7	4.4
2013	Gio Gonzalez	Nationals	819	8.83	3.50	0.78	43.9	33.3	9.7	3.2
2011	Gio Gonzalez	Athletics	864	8.78	4.05	0.76	47.5	34.1	8.9	3.1
2010	Gio Gonzalez	Athletics	851	7.67	4.13	0.67	49.3	35.3	7.4	3.1

Cluster 60

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2011	Brad Ziegler	– – –	239	6.79	2.93	0.00	68.6	13.4	0.0	1.0
2007	Cla Meredith	Padres	342	6.67	1.92	0.68	72.0	13.6	17.1	1.0
2008	Brad Ziegler	Athletics	229	4.53	3.32	0.30	64.7	18.8	6.3	0.5
2013	Joe Smith	Indians	259	7.71	3.29	0.71	49.1	30.1	9.6	0.5
2008	Chad Bradford	– – –	241	2.58	2.28	0.46	66.5	16.0	9.4	0.4
2012	Cody Eppley	Yankees	194	6.26	3.33	0.59	60.3	19.1	11.1	0.3
2008	Joe Smith	Mets	271	7.39	4.41	0.57	62.6	17.9	12.5	0.3
2009	Cla Meredith	– – –	283	5.10	3.44	0.55	62.9	21.1	8.9	0.2
2010	Brad Ziegler	Athletics	257	6.08	4.15	0.59	54.4	26.9	8.2	0.1
2014	Brad Ziegler	Diamondbacks	281	7.25	3.22	0.67	63.8	18.9	13.5	0.1

Cluster 68

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2009	Justin Verlander	Tigers	982	10.09	2.36	0.75	36.0	42.8	7.4	7.7
2012	Justin Verlander	Tigers	956	9.03	2.27	0.72	42.3	35.6	8.3	6.8
2011	Justin Verlander	Tigers	969	8.96	2.04	0.86	40.2	42.1	8.8	6.4
2010	Justin Verlander	Tigers	925	8.79	2.85	0.56	41.0	40.3	5.6	6.3
2013	Justin Verlander	Tigers	925	8.95	3.09	0.78	38.4	38.9	7.8	4.9

Cluster 69

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2008	Manny Parra	Brewers	741	7.97	4.07	0.98	51.6	26.6	13.5	2.3
2014	Drew Smyly	– – –	618	7.82	2.47	1.06	36.6	43.4	9.5	2.2
2012	J.A. Happ	– – –	627	8.96	3.48	1.18	44.0	38.9	11.9	1.9

Cluster 70

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Gerrit Cole	Pirates	571	9.00	2.61	0.72	49.2	31.8	9.4	2.3
2009	Luke Hochevar	Royals	631	6.67	2.90	1.45	46.6	35.8	13.8	1.0
2012	Joe Kelly	Cardinals	457	6.31	3.03	0.84	51.7	27.5	11.0	0.9
2008	Sidney Ponson	– – –	612	3.85	3.18	0.93	54.5	26.2	10.9	0.9
2013	Joe Kelly	Cardinals	532	5.73	3.19	0.73	51.1	28.2	8.9	0.7
2009	Roberto Hernandez	Indians	596	5.67	5.03	1.15	55.2	27.0	13.7	0.0

Cluster 71

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2008	Chris Young	Padres	434	8.18	4.22	1.14	21.7	53.4	8.7	1.4
2012	Chris Young	Mets	493	6.26	2.82	1.25	22.3	58.2	7.7	1.2
2013	Josh Collmenter	Diamondbacks	384	8.32	3.23	0.78	32.7	46.8	6.9	1.0
2012	Josh Collmenter	Diamondbacks	375	7.97	2.19	1.30	37.4	43.1	11.5	0.8
2009	Chris Young	Padres	336	5.92	4.74	1.42	30.2	51.7	10.0	0.0

Cluster 72

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Madison Bumgarner	Giants	873	9.07	1.78	0.87	44.4	35.8	10.0	4.0
2013	Jon Lester	Red Sox	903	7.47	2.83	0.80	45.0	35.4	8.3	3.5

Cluster 77

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2011	Josh Collmenter	Diamondbacks	621	5.83	1.63	0.99	33.3	47.0	7.7	2.3
2014	Josh Collmenter	Diamondbacks	719	5.77	1.96	0.90	38.8	39.9	8.3	1.9

Cluster 78

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2007	Rich Hill	Cubs	812	8.45	2.91	1.25	36.0	42.9	11.7	3.1
2014	Tyler Skaggs	Angels	464	6.85	2.39	0.72	50.1	30.9	8.7	1.5
2011	Danny Duffy	Royals	474	7.43	4.36	1.28	37.5	40.3	11.5	0.5
2010	Manny Parra	Brewers	560	9.52	4.65	1.33	47.2	34.5	14.8	0.3

Cluster 79

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2012	David Price	Rays	836	8.74	2.52	0.68	53.1	27.0	10.5	5.0
2011	C.J. Wilson	Rangers	915	8.30	2.98	0.64	49.3	31.9	8.2	4.9
2010	C.J. Wilson	Rangers	850	7.50	4.10	0.44	49.2	33.5	5.3	4.1
2013	C.J. Wilson	Angels	913	7.97	3.60	0.64	44.4	33.4	7.2	3.2
2012	Madison Bumgarner	Giants	849	8.25	2.12	0.99	47.9	33.3	11.7	3.1
2011	Derek Holland	Rangers	843	7.36	3.05	1.00	46.4	33.6	11.0	3.0
2012	Wandy Rodriguez	– – –	875	6.08	2.45	0.92	48.0	31.6	10.1	2.5
2014	Jason Vargas	Royals	790	6.16	1.97	0.91	38.3	38.7	8.2	2.2
2012	C.J. Wilson	Angels	865	7.70	4.05	0.85	50.3	29.9	10.8	2.2

Cluster 85

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2012	Cliff Lee	Phillies	847	8.83	1.19	1.11	45.0	36.9	11.8	5.0
2014	Cole Hamels	Phillies	829	8.71	2.59	0.62	46.4	31.1	8.2	4.3
2009	Wandy Rodriguez	Astros	849	8.45	2.76	0.92	44.9	37.1	9.9	4.1
2012	Wade Miley	Diamondbacks	807	6.66	1.71	0.65	43.3	33.7	6.9	4.1
2013	Jose Quintana	White Sox	832	7.38	2.52	1.03	42.5	37.4	10.2	3.5
2009	Andy Pettitte	Yankees	834	6.84	3.51	0.92	42.9	37.8	8.9	3.4
2012	Wei-Yin Chen	Orioles	818	7.19	2.66	1.35	37.1	42.1	11.7	2.3

Cluster 86

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2009	Josh Beckett	Red Sox	883	8.43	2.33	1.06	47.2	31.7	12.8	4.2
2010	Max Scherzer	Tigers	800	8.46	3.22	0.92	40.3	40.0	9.6	3.7
2014	Nathan Eovaldi	Marlins	854	6.40	1.94	0.63	44.8	32.9	6.6	2.9
2012	Lucas Harrell	Astros	827	6.51	3.62	0.60	57.2	22.5	9.7	2.8
2013	Jeff Samardzija	Cubs	914	9.01	3.29	1.05	48.2	31.4	13.3	2.7
2011	Max Scherzer	Tigers	833	8.03	2.58	1.34	40.3	39.5	12.6	2.2
2009	Mike Pelfrey	Mets	824	5.22	3.22	0.88	51.3	30.0	9.5	1.7
2011	Roberto Hernandez	Indians	833	5.20	2.86	1.05	54.8	26.6	13.0	0.9

Cluster 92

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2014	Steve Cishek	Marlins	275	11.57	2.89	0.41	42.7	31.1	5.9	2.0
2007	Sean Green	Mariners	304	7.01	4.50	0.26	60.9	18.8	5.1	0.7
2008	Sean Green	Mariners	358	7.06	4.10	0.34	63.3	19.5	6.1	0.7
2011	Shawn Camp	Blue Jays	292	4.34	2.98	0.41	53.5	25.7	5.2	0.3
2010	Shawn Camp	Blue Jays	298	5.72	2.24	1.00	52.0	31.4	11.1	0.2

Cluster 95

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2008	Cliff Lee	Indians	891	6.85	1.37	0.48	45.9	35.1	5.1	6.7
2012	Cole Hamels	Phillies	867	9.03	2.17	1.00	43.4	35.1	11.9	4.6
2013	Cole Hamels	Phillies	905	8.26	2.05	0.86	42.7	36.7	9.1	4.5
2008	Scott Kazmir	Rays	641	9.81	4.14	1.36	30.8	48.9	12.0	2.0

Cluster 97

year	Name	Team	TBF	K9	BB9	HR9	GB_pct	FB_pct	HR_FB	WAR
2011	Jered Weaver	Angels	926	7.56	2.14	0.76	32.5	48.6	6.3	5.7
2009	Jered Weaver	Angels	882	7.42	2.82	1.11	30.9	50.4	8.3	3.9
2014	Chris Tillman	Orioles	871	6.51	2.86	0.91	40.6	39.3	8.3	2.3
2009	Joe Blanton	Phillies	837	7.51	2.72	1.38	40.6	39.5	12.9	2.2
2013	Chris Tillman	Orioles	845	7.81	2.97	1.44	38.6	39.8	14.2	1.9

A Year In xISO

by Andrew Dominijanni

October 15, 2016

For the type of baseball fan I’ve become — one who follows the sport as a whole rather than focuses on a particular team — 2016 was the season of Statcast. Even for those who watch the hometown team’s broadcast on a nightly basis, exit velocity and launch angle have probably become familiar terms. While Statcast was around last season, it seems fans and commentators alike have really embraced it in 2016.

Personally, I commend MLB for democratizing Statcast data, at least partially, especially when they are under no apparent obligation to do so. I’ve enjoyed the Statcast Podcast this season, but most of all, I’ve benefited from the tools available at Baseball Savant. For it is that tool which has allowed me to explore xISO. I first introduced an attempt to incorporate exit velocity into a player’s expected isolated slugging (xISO). I subsequently updated the model and discussed some notable first half players. Alex Chamberlain was kind enough to include my version of xISO in the RotoGraphs x-stats Omnibus, and I’ve been maintaining a daily updated xISO resource ever since.

Happily for science, all of my 2016 first half “Overperformers” saw ISO declines in the second half, while most of my first half “Underperformers” saw large drops in second half playing time. Rather than focus on individuals, though, let’s try to estimate the predictive value of xISO in 2016.

Yuck. This plot shows how well first-half ISO predicted second-half ISO, compared to how well first-half xISO predicted the same, for 2016 first AND second-half qualified hitters. Both of these are calculated using the model as it was at the All-Star break. There are two takeaways: First-half ISO was a pretty bad predictor of second-half ISO, and first-half xISO was also a pretty bad predictor of second-half ISO. Mercifully though, first-half xISO was a bit better than ISO at predicting future ISO. This is consistent with the findings in my first article, and a basic requirement I set out to satisfy.

Now, an interesting thing happened recently. After weeks of hinting, Mike Petriello unveiled “Barrels”. Put simply, Barrels are meant to be a classification of the best kind of batted balls. Shortly thereafter, Baseball Savant began tabulating total Barrels, Barrels per batted ball (Brls/BBE), and Barrels per plate appearance (Brls/PA). In a way, this is similar to Andrew Perpetua’s approach to using granular batted-ball data to track expected outcomes for each batted ball, except that the Statcast folks have taken only a slice of launch angles and exit velocities to report as Barrels.

By definition, these angles and velocities are those for which the expected slugging percentage is over 1.500, so it would appear that this stat could be a direct replacement for my xISO. Not so fast! First of all, because ISO is on a per at-bat (AB) basis, we definitely need to calculate Brls/AB from Brls/PA. This is not so hard if we export a quick FanGraphs leaderboard. Let’s check how well Brls/AB works in a single-predictor linear model for ISO:

Not too bad. The plot reports both R-squared and adjusted R-squared, for comparison with multiple regression models. I won’t show it, but this is almost exactly the coefficient of determination that my original xISO achieves with the same training data. I still notice a hint of nonlinearity, and I bet we can do better.

Hey now, that’s nice. In terms of adjusted R-squared, we’ve picked up about 0.06, which is not insignificant. The correlation plot also looks better to my eye. So what did I do? As is my way, I added a second-order term, and sprinkled in FB% and GB% as predictors. The latter two are perhaps controversial inclusions. FB% and/or GB% might be suspected to be strongly correlated with Brls/AB, introducing some undesired multicollinearity. While I won’t show the plots, it doesn’t actually turn out to be a big problem in this case. Both FB% and GB% have Pearson correlation coefficients close to 0.5 with Brls/AB (negative correlation in the case of GB%). Here’s the functional form of the multiple regression model plotted above, which was trained on all 2016 qualified hitters:

${\color{Blue} 2.01179*Brls/AB+0.12122*FB-0.08887*GB-4.9214*\left ( Brls/AB \right )^2+0.09044}$

To be honest, there is something about my first model that I liked better. This version, using Barrels, feels like a bit of a half-measure between Andrew Perpetua’s bucketed approach and my previous philosophy of using only average exit-velocity values and batted-ball mix. My original intent was to create a metric that could be easily calculated from readily available resources, so in that sense, I’m still succeeding. Going forward, I will be calculating both versions on my spreadsheet. I’m excited to see which version serves the community better heading into 2017!

As always, I’m happy to entertain comments, questions, or criticisms.

Did the Cubs and Giants Have the Best Pitcher-Hitting Series Ever?

by Jim_Turvey

October 14, 2016

With a wild comeback in Game 4 on Tuesday night, the Cubs secured their spot in the NLCS for the second straight season. Considering where the team was just five years ago, this is obviously an impressive achievement. But maybe more impressive is how they reached that second consecutive NLCS. The Cubs scored 17 runs against the Giants in their NLDS showdown, and six of those were driven in by their pitchers! That’s an absurd 35% of the Cubs’ run output coming from the guys who usually do the run prevention.

When Travis Wood hit his incredible home run as a relief pitcher in Game 2, it was the first postseason home run from a pitcher since Joe Blanton took Edwin Jackson deep in Game 4 of the 2008 World Series, and the first postseason home run from a reliever since 1924.

When Jake Arrieta left the yard in the first inning of the very next game, it became the first postseason series with multiple home runs off the bats of pitchers since the 1968 World Series, when Mickey Lolich and Bob Gibson each went deep in a seven-game series. Of course, Lolich and Gibson were rivals, not teammates, making the Wood-Arrieta accomplishment even more impressive — and rare. In fact, it was only the second time in the history of baseball (per Baseball-Reference Play Index) that two pitchers, on the same team, hit home runs in the same series. The only other time with in the 1924 World Series, when New York Giant teammates, and pitchers, Jack Bentley and Rosy Ryan homered in Games 3 and 5 of the epic seven-game series. Wood and Arrieta were the only ones to do so in back-to-back games.

* * *

Now, it wasn’t just the Cubs pitchers getting in on the fun. For a while Tuesday night, it looked as though Giants starter, Matt Moore, was going to be a two-fold hero. Shutting down the Cubs offense from the mound, and knocking in the first run of the game for the Giants in the bottom of the fourth. While that was the only hit from Giants pitchers in the series, it was still enough to set the combined hitting totals for the two teams to: .250 batting average, with a .625 slugging percentage, while knocking in 23 percent of the total runs scored.

Those are some pretty crazy totals, but are they the best ever?

Using the aforementioned Play Index search of all-time postseason home runs from pitchers, there are 18 different series (including the 2016 NLDS) in which a pitcher homered. In those series, on three occasions, the pitcher who hit the home run was the only pitcher to get a hit in the entire series (1984 Rick Sutcliffe, 1978 Steve Carlton, 1975 Don Gullet). Only twice did pitchers combine for more than the 10 total bases from the Giants and Cubs, and only once did they drive in more than the seven runs (and they never topped the percent of runs driven in). Let’s go to the chart:

Top Team Pitcher Performances in the Playoffs

Year	Hits	AB	BA	TB	SLG	RBI	Series runs	% of RBI
2016 NLDS	4	16	0.250	10	0.625	7	30	23.33
2008 WS	2	13	0.154	5	0.385	1	39	2.56
2006 NLCS	2	25	0.080	5	0.200	1	55	1.82
2003 NLCS	3	28	0.107	6	0.214	3	82	3.66
1984 NLCS	4	17	0.235	7	0.412	1	48	2.08
1978 NLCS	2	17	0.118	5	0.294	4	38	10.53
1975 NLCS	2	12	0.167	5	0.417	3	26	11.54
1974 WS	4	20	0.200	8	0.400	1	27	3.70
1970 WS	2	25	0.080	5	0.200	4	53	7.55
1970 ALCS	5	18	0.278	10	0.556	6	37	16.22
1969 WS	5	26	0.192	10	0.385	5	24	20.83
1968 WS	5	36	0.139	11	0.306	4	63	6.35
1967 WS	2	30	0.067	8	0.267	2	46	4.35
1965 WS	5	32	0.156	9	0.281	6	44	13.64
1958 WS	7	37	0.189	10	0.270	8	54	14.81
1940 WS	3	39	0.077	7	0.179	2	50	4.00
1926 WS	4	39	0.103	8	0.205	2	52	3.85
1924 WS	8	42	0.190	14	0.333	5	53	9.43
1920 WS	6	39	0.154	9	0.231	3	29	10.34

After a brief peruse, it’s clear that there are only a few cases in which the pitchers in a series can even come close to what we just saw. Let’s take a look at the five best, in ascending order:

1968 World Series

This was one of the three series before the 2016 NLDS in which multiple pitchers hit home runs. In 1968, it was, as noted above, Bob Gibson and Mickey Lolich who homered in the series, one each for the Cardinals and Tigers. The reason this series is in fifth in the challengers to Cubs-Giants is because those two pitchers were really it. They drove in the only four runs from pitchers in the series (three of the four RBI coming on the two home-run swings), and there was only hit to hit come from a non-Gibson/Lolich pitcher.

1969 World Series

Just a year after our first entry into this challenge, the Mets and Orioles played in the first World Series to be led off with a League Championship Series. The extra-long season didn’t stop the Mets and Orioles pitchers from contributing all over the diamond, however, as they crammed five hits, 10 total bases, and five RBI into just a five-game series. Because of the abbreviated length of the series, this is one of the few series that can challenge the 2016 NLDS in terms of percentages. That being said, the Cubs-Giants pitchers take all three percentage categories, leaving there no real room for debate on this one.

1958 World Series

The 1958 series stands out in that it was the highest RBI total for pitchers in any postseason series to date. That was thanks in large part to top two pitchers for the Braves, Warren Spahn and Lew Burdette, tallying three RBI apiece. Burdette did it with the long ball, while Spahn preferred the death-by-a-thousand-cuts method, tallying his three RBI on four hits in the series. The Yankees got two RBI of their own from Bob Turley, but I’m not quite willing to give these guys the edge over the Cubs-Giants pitchers. The easiest argument for this year’s NLDS is that the Cubs-Giants pitchers tallied as many total bases and only one less RBI in three fewer games, as the 1958 World Series went to seven games, while this year’s NLDS went just four games.

1924 World Series

Here’s where the challenge gets real stiff. The 1924 World Series is the other series in which we have two home runs from pitchers, the aforementioned Bentley and Ryan teammates for the Giants. This series tops our charts in hits (8) and total bases (14), and is a reasonable choice for best-hitting series from a group of pitchers. I’m still giving the edge to Cubs-Giants in this showdown, though, and for a couple of reasons. Actually, really one reason with a couple different explanations: opportunity. Similar to the 1958 World Series, the 1924 World Series went to seven games, meaning that pitchers had far more games to rack up those hits and total bases. Pitchers were also left in games far longer in the 1920s, and as such, tallied almost three times as many at bats as the 2016 NLDS pitchers. When comparing batting average (.250 to .190) and, even more so, slugging percentage (.625 to .333) it becomes clear that this year’s Cubs-Giants pitchers still reign supreme.

1970 ALCS

Here’s our winner. The only series that I believe tops the recently concluded Cubs-Giants NLDS in terms of output from pitchers at the plate. This was an even shorter series than Cubs-Giants, as the Orioles only needed three games to dispatch the Twins. And their pitchers were a good chunk of the reason why. The Orioles used just four pitchers in the series, but all four got hits, combining for all of the offense you see above. (Twins pitchers were 0-for-5 in the series.) Not only did all four get hits, but all three starters got extra-base hits, as Dave McNally, Jim Palmer, and Mike Cuellar (Dick Hall was the reliever) all showed what they were capable of on the other side of the ball. Of course, the very next season, these three starters, along with Pat Dobson, would form just the second-ever set of four 20-game winners on the same team, proving just how awesome the late `60s and early `70s Orioles really were. They reign supreme for now, but let’s see how those Cubs starting pitchers do for the rest of the 2016 playoffs.

Let’s Get the Twins to the World Series

by Beau Horan

October 13, 2016

Imagine for a second that MLB Commissioner Rob Manfred has gone senile. I know that’s a ridiculous premise, and this is sure to be a ridiculous post, but bear with me. Commissioner Manfred, perhaps after a long night of choice MLB-sponsored adult beverages, has placed the Minnesota Twins in the playoffs. Yes, the same Twins of the .364 win percentage and facial hair promotional days. What is the probability that they make or win the World Series? For simplicity, let’s say they take the place of both AL Wild Card teams and are just inserted into the divisional playoffs.

We are going to look at a bunch of ways of estimating the probability the Twins win a five-game series or a seven-game series, then multiply our results accordingly to find an estimate for the team reaching each round. We’ll start simply, and gradually progress to more complicated methods of estimation. Let’s start as simply as possible, then, and use the Twins’ .364 win percentage. The probability of the Twins winning a five-game series (at least three out of five games) is 25.7%. The same process gives them a 22.4% chance of winning a seven-game series. Multiplying these out gives the Twins a 5.8% chance of reaching the World Series (roughly 1 in 17) and a 1.3% chance of winning it. For reference, those are nearly the same odds FanGraphs gave the Mets of reaching/winning the World Series on October 2nd. Of course, those Mets also had to get through the Wild Card round (and the greatest frat boy to ever pitch a playoff game), but failed to do so.

Okay, so maybe you didn’t like that method because we included the Twins’ entire regular season, instead of just including games against playoff teams. Noted, but just understand that the Twins had basically the same win percentage against playoff teams (.365) as their overall percentage. Just to note, I defined playoff teams as the six division winners plus the four wild card teams. Using the Twins’ percentage against playoff teams yields identical probabilities as above.

How else can we attack this problem? Well, the Twins played 162 games this year, which means they have 158 different five-game stretches and 156 seven-game stretches. Over all those five-game rolling “series”, the Twins won at least three games 24.1% of the time, and they won at least four games in 25% of their seven-game tilts. Multiplying those figures out gives them a 6% chance of reaching the World Series and a 1.5% chance of becoming world champs.

Again, those numbers are unsatisfying because they include all teams, not just the playoff teams. However, removing the non-playoff teams leaves us with a bit of a sample issue because they played 52 games against playoff teams. So, let’s change the problem slightly: what is the probability that a last-place team can reach, and win, the World Series? The teams I’ll be considering all finished in last in their respective divisions: Twins, Athletics, Rays, Braves, Reds, and Padres. Cumulatively, these teams had a win percentage of .412, won 37.4% of their games against playoff teams, won at least three games in 30.6% of their five-game stretches, and won at least four out of seven 29.9% of the time. You can multiply these percentages out and get some answers.

I’m still not satisfied, so there is one more tool I’m gonna break out: a bootstrap simulation. Bootstrapping basically means sampling with replacement, which means every time I randomly choose a game from the sample, that game is thrown back in and has the same exact chance of getting picked again. This resampling with replacement process gives the bootstrap some pretty useful properties that I won’t get into here, but you can check here for more info.

I’m going to put all the games the last-place teams played against playoff teams into a pile. I’m going to randomly sample five games from that pile, with replacement, and count how many games were wins. I’m going to do this 100,000 times. I will then divide the number of samples that included at least three wins by the total number of samples, giving me an estimated probability of these last-place teams winning a five-game series against a playoff team. I will repeat this process for a seven-game series.

The bootstrap probability of a last-place team winning a five-game series against a playoff team was 27%. The probability of them winning a seven-game series was 24%. They have a 6.5% chance of reaching the World Series and 1.6% chance of winning it.

Honestly, these probabilities are lower than I expected. I have believed in and learned to embrace the randomness of the MLB postseason. I went into this post expecting the outcome to highlight just how random the postseason really is, even absurdly so. However, the randomness of the postseason really depends on the extremely small differences between all the teams at the top, so inserting teams from the very bottom of the league introduces a level of certainty that would be new to the playoffs. However, imagine repeating a similar exercise for the NFL or NBA. The 27% or so chance I’d give the Twins of advancing seems much higher than the probability of, say, the Cleveland Browns winning a playoff game if inserted into the postseason.

My methodology was clearly very simple, but intentionally so. I gave no acknowledgement to a home-field advantage adjustment, and I looked only at the team’s W-L record. A more complex method could have taken into consideration Pythagorean Expectation or BaseRuns.

This was a ridiculous post and ultimately a meaningless exercise. The Twins probably couldn’t reach the World Series if they were placed in the playoffs, but I’ll point out that as of this writing (October 10th during Game 3 of Nationals-Dodgers) the Cubs also probably won’t reach the World Series. Baseball is a weird and wonderful sport, and the postseason is the weirdest and most wonderful time of the year. If the Twins could conceivably reach the World Series as currently constructed, don’t think too hard about what’s happening and just enjoy.

53 Things About a 53-Second Finnish Baseball Video

by Mahoney

October 9, 2016

With no baseball being played on this Monday night as I write this, I thought I’d throw this out for a quick fix. Granted, this is baseball as it’s played in Finland:

Below is a second-by-second recap of all the glorious action.

{note – because the Stone-Age author doesn’t know how to post GIFs into an article, you’ll have to pause the video yourself to freeze the action for each of the 53 seconds}

0:01 – Dude in the white-striped uniform way off the plate, obviously trying to avoid catcher’s interference because of the dude in the orange-and-blue uniform.

0:02 – Orange-and-blue apparently spots the pitcher striding towards the pitcher’s mound, which I guess in Finnish is the “tikli”.

0:03 – There’s a “ski” on the back of the hitter’s jersey, so he must be Sami Haapakoski. Not likely to be another Polish guy on a Finnish baseball team.

0:04 – And he’s got his hands backwards. (I’d love to see how he holds a light bulb to screw it in)

0:05 – And now the catcher flips the ball up in the air! A combination hidden-ball trick/quick-pitch.

0:06 – First baseman charging in…Sami charging at the offering, which can only mean…

0:07 – A line drive over the first baseman’s head. Well played Sami!

0:08 – Sami now runs down the THIRD-BASE LINE!!!! (being half-Polish myself I have no more capacity to joke). This means that the runner who’s already there (Jeano Segurannen) has to start running to second.

0:09 – What’s with the water hazard inside the park? I guess with this being Finnish baseball, they’ve replaced right field with a right fjord.

0:10 – I like the greenery in right fjord. Gives it a Wrigley-like ambiance (this is the Obligatory 2016 Cubs Reference™ for this article)

0:11 – Crowd going wild, screaming for Sami to run the bases the right way and not blow a well-earned ground-rule double.

0:12 – Or maybe it’s a ground-rule triple if it gets stuck in the poison ivy. Not sure.

0:13 – Love the hustle on the guy in right fjord. Plays the game the right way, he does.

0:14 – And emerging from behind a tree there’s an umpire, checking to see if the ball lodged in the poison ivy for a triple or into the water for a double….what, the ball’s IN PLAY??!?

0:15 – Yep. The right fjorder (Jonni Damonen) swiftly tosses a relay to one of his fellow outfjorders.

0:16 – Unfortunately, Ryän Raburninnen isn’t known for having the best “handle” in this sport

0:17 – Average water temperatures in Finland are colder than anywhere in the continental USA. That’s because they’re measured in degrees Celsius.

0:18 – Look, there’s Jeano rounding the bases the right way

0:19 – Poor right fjorder takes his second plunge in the last five seconds. Someone please fire up a sauna for ol’ Jonni.

0:20 – And there’s Sami flying like a Finn right behind him. All this fumbling of the frigid fjord-frozen ball in right fjord has allowed them to finally move forward again.

0:21 – Nice flip by the right fjorder. Maybe they should move him to second base, wherever the hell they put that in Finland.

0:22 – Nice use of the split screen for the fielding and baserunning portions of the play. Might catch on for MLB telecasts if they ever tried it.

0:23 – Here comes Sami to his jubilant teammates….

0:24 – …PSYCH!!…

0:25 – …running up the third-base line without him

0:26 – The right fjorder pulls his hypothermic body up Tallinn’s Hill, his efforts having been to no avail.

0:27 – Why are they running out there with their bats? I am so thoroughly confused.

0:28 – Led Zeppelin, the official sponsor of the third-base warning track.

0:29 – Those uniforms make these guys look like a NASCAR pit crew. Waiting for one of them to hand Sami a champagne bottle to spray the place.

0:30 – Some guy in a blue jacket is taking a stroll in from left field, apparently oblivious to all the mayhem.

0:31 – This part of the field is also used for the Finnish Capture The Flag League.

0:32 – Finnish vodka is excellent. Just ask the camera guy.

0:33 – Guy in blue jacket has a helmet on. Must be from a different pit crew.

0:34 – Ebullient Finnish yelling.

0:35 – This part of the field was formerly used by the local Finnish Basketball Association team. The team disbanded once it was discovered that someone forgot to put up an actual basket.

0:36 – The one guy with a green helmet comes towards the camera with his bat in ready position. Must be the team’s enforcer.

0:37 – “HAYYYYY!!!”

0:38 – Another yell sounding like “BASEBALLLL!!!!”

0:39 – Coach about to give Sami a water bottle for all his efforts with the bat and on the basepaths (both clockwise and counterclockwise)

0:40 – Fun fact: one of those long Finnish words on Sami’s uni means “this space available for sale”. I forgot exactly which one it was.

0:41 – At least Sami holds the water bottle correctly.

0:42 – How come there’s no left fjord?

0:43 – Fuzzy blue feet can only mean one thing — a mascot! Wonder who/what they have for mascots in Finland?

0:44 – It’s the love child of these two! Sweet!

0:45 – Not sure what that thing is over the bleachers behind home plate (home Frisbee?). Looks vaguely aerodynamic.

0:46 – Someone obviously has a job that includes coordinating handtowels to these guys’ uniforms. The age of specialization is not merely a North American phenomenon.

0:47 – Because Finnish baseballs are often contaminated with fjord-borne bacteria, used handtowels are the souvenir of choice.

0:48 – Eriko is like… what?

0:49 – Ignoring the two kids waving for the towel in the front, Sami fires a Hail Mary pass for the blonde in the top row.

0:50 – Notice all the parkas and heavy winter clothing on these fans. Although the average game-time temperature in Finland is about 17°C, the temperature on this evening was only 10°C, which is just 10 degrees above the freezing point of the right fjorder’s uniform.

0:51 – Nobody bothered to man the lemonade stand in left field just past the bleachers. Guy in the blue jacket probably just walked off with the lemons.

0:52 – Can the Finnish president override a vimpelin veto?

0:53 – Fun fact: the official logo of Superpesis, the major league of Finnish baseball, has basically the same logo as the NBC peacock.

Thank you for watching, and have a nice day.

Hardball Retrospective – What Might Have Been – The “Original” 2002 Blue Jays

by DerekBain

October 8, 2016

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition. Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 2002 Toronto Blue Jays

OWAR: 51.4 OWS: 312 OPW%: .572 (93-69)

AWAR: 34.2 AWS: 234 APW%: .481 (78-84)

WARdiff: 17.2 WSdiff: 78

The 2002 “Original” Blue Jays breezed to the American League East title, vanquishing the Yankees by a nine-game margin. Toronto topped the American League in OWAR and OWS. Shawn Green (.285/42/114) registered 110 tallies, achieved his second All-Star appearance and finished fifth in the MVP balloting. Jeff Kent (.313/37/108) drilled 42 doubles and attained a career-high in home runs. Carlos Delgado belted 33 round-trippers and coaxed 102 bases on balls. John Olerud (.300/22/102) laced 39 two-base hits and collected the Gold Glove Award. In the midst of five straight seasons with a batting average above .300, Shannon Stewart sliced 38 doubles and scored 103 runs. Vernon Wells reached the century mark in RBI and added 34 two-base knocks in his first full season. The “Actual” squad featured 2002 AL Rookie of the Year Eric Hinske (.279/24/84) at the hot corner.

Jeff Kent placed forty-eighth among second-sackers in the “The New Bill James Historical Baseball Abstract” top 100 player rankings while John Olerud secured the 53^rd slot at first base.

Original 2002 Blue Jays Actual 2002 Blue Jays

STARTING LINEUP	POS	OWAR	OWS	STARTING LINEUP	POS	AWAR	AWS
Shannon Stewart	LF	2.37	18.47	Shannon Stewart	LF	2.37	18.47
Vernon Wells	CF	0.83	16.7	Vernon Wells	CF	0.83	16.7
Shawn Green	RF	6.18	32.07	Jose L. Cruz	RF/LF	1.73	12.62
John Olerud	DH/1B	4.64	25.92	Josh Phelps	DH	1.46	9.8
Carlos Delgado	1B	4.76	25.97	Carlos Delgado	1B	4.76	25.97
Jeff Kent	2B	6.04	29.93	Dave Berg	2B	0.18	8.61
Alex S. Gonzalez	SS	2.78	14.36	Chris Woodward	SS	2.17	11.74
Chris Stynes	3B	-0.02	3.46	Eric Hinske	3B	3.8	21.81
Greg Myers	C	0.57	5.57	Tom Wilson	C	0.43	5.88
BENCH	POS	OWAR	OWS	BENCH	POS	AWAR	AWS
Jay Gibbons	RF	0.59	11.97	Raul Mondesi	RF	0.08	6.33
Chris Woodward	SS	2.17	11.74	Orlando Hudson	2B	1.17	5.89
Craig A. Wilson	RF	0.95	10.78	Felipe Lopez	SS	0.08	5.8
Michael Young	2B	-0.63	10.72	Ken Huckaby	C	-1.24	1.78
Josh Phelps	DH	1.46	9.8	Joe Lawrence	2B	-0.83	1.48
Orlando Hudson	2B	1.17	5.89	Dewayne Wise	RF	-0.42	1.39
Felipe Lopez	SS	0.08	5.8	Jayson Werth	RF	0.04	0.77
Brent Abernathy	2B	-0.44	4.99	Homer Bush	2B	-0.27	0.75
Abraham Nunez	2B	0.04	4.88	Darrin Fletcher	C	-0.44	0.64
Cesar Izturis	SS	-0.68	3.77	Brian Lesher	1B	-0.5	0.23
Ryan Thompson	LF	0.14	2.84	Kevin Cash	C	-0.14	0.08
Joe Lawrence	2B	-0.83	1.48	Pedro Swann	DH	-0.18	0
Pat Borders	DH	0.06	0.36
Mike Coolbaugh	3B	-0.17	0.16
Casey Blake	3B	-0.11	0.11
Kevin Cash	C	-0.14	0.08

Roy “Doc” Halladay (19-7, 2.93) warranted his first All-Star invitation and led the American League with 239.1 innings pitched. David “Boomer” Wells compiled 19 victories with a 3.75 ERA. Toronto’s superb bullpen staff was anchored by Billy Koch (3.27, 44 SV) and Jose Mesa (2.97, 45 SV). The setup corps consisted of Steve Karsay (3.26, 12 SV), Ben Weber (7-2, 2.54) and Kelvim Escobar (4.27, 38 SV).

Original 2002 Blue Jays Actual 2002 Blue Jays

ROTATION	POS	OWAR	OWS	ROTATION	POS	AWAR	AWS
Roy Halladay	SP	6.74	21.67	Roy Halladay	SP	6.74	21.67
David Wells	SP	3.99	14.79	Pete Walker	SP	1.85	8.74
Woody Williams	SP	3.2	9.65	Mark Hendrickson	SP	1.23	4.01
Gary Glover	SP	0.03	4.54	Esteban Loaiza	SP	-0.15	3.86
Mark Hendrickson	SP	1.23	4.01	Justin Miller	SP	-0.23	3.4
BULLPEN	POS	OWAR	OWS	BULLPEN	POS	AWAR	AWS
Billy Koch	RP	1.44	18.37	Kelvim Escobar	RP	0.53	9.14
Jose Mesa	RP	1.28	12.4	Cliff Politte	RP	1.05	6.49
Steve Karsay	RP	2.01	11	Corey Thurman	RP	0.54	3.66
Ben Weber	RP	1.33	10.48	Felix Heredia	RP	0.09	3.12
Kelvim Escobar	RP	0.53	9.14	Scott Eyre	RP	0.11	2.83
Mike Timlin	RP	1	8.04	Chris Carpenter	SP	0.41	2.73
Giovanni Carrara	RP	0.62	6.77	Steve Parris	SP	0	1.88
David Weathers	RP	1.02	6.68	Scott Cassidy	RP	-0.43	1.67
Chris Carpenter	SP	0.41	2.73	Dan Plesac	RP	0.33	1.39
Graeme Lloyd	RP	-0.53	1.89	Brian Bowles	RP	0.04	1.37
Scott Cassidy	RP	-0.43	1.67	Jason Kershner	RP	0.12	0.65
Jose Silva	RP	0.11	1.38	Pedro Borbon	RP	-0.07	0.48
Brian Bowles	RP	0.04	1.37	Scott Wiggins	RP	0.05	0.2
Mark Lukasiewicz	RP	0	1.17	Pasqual Coco	RP	-0.13	0
Jim Mann	RP	0.18	1.02	Brian Cooper	SP	-0.59	0
Carlos Almanzar	SW	0.24	0.94	Bob File	RP	-0.47	0
Tom Davey	RP	-0.36	0.17	Brandon Lyon	SP	-0.56	0
Pasqual Coco	RP	-0.13	0	Luke Prokopec	SP	-0.91	0
Bob File	RP	-0.47	0	Mike Smith	SP	-0.45	0
Pat Hentgen	SP	-0.54	0
Brandon Lyon	SP	-0.56	0
Aaron Small	RP	-0.08	0
Mike Smith	SP	-0.45	0
Todd Stottlemyre	SP	-0.38	0

Notable Transactions

Shawn Green

November 8, 1999: Traded by the Toronto Blue Jays with Jorge Nunez (minors) to the Los Angeles Dodgers for Pedro Borbon and Raul Mondesi.

Jeff Kent

August 27, 1992: Traded by the Toronto Blue Jays with a player to be named later to the New York Mets for David Cone. The Toronto Blue Jays sent Ryan Thompson (September 1, 1992) to the New York Mets to complete the trade.

July 29, 1996: Traded by the New York Mets with Jose Vizcaino to the Cleveland Indians for Carlos Baerga and Alvaro Espinoza.

November 13, 1996: Traded by the Cleveland Indians with a player to be named later, Julian Tavarez and Jose Vizcaino to the San Francisco Giants for a player to be named later and Matt Williams. The Cleveland Indians sent Joe Roa (December 16, 1996) to the San Francisco Giants to complete the trade. The San Francisco Giants sent Trent Hubbard (December 16, 1996) to the Cleveland Indians to complete the trade.

John Olerud

December 20, 1996: Traded by the Toronto Blue Jays with cash to the New York Mets for Robert Person.

October 27, 1997: Granted Free Agency.

November 24, 1997: Signed as a Free Agent with the New York Mets.

October 29, 1999: Granted Free Agency.

December 15, 1999: Signed as a Free Agent with the Seattle Mariners.

Billy Koch

December 7, 2001: Traded by the Toronto Blue Jays to the Oakland Athletics for Eric Hinske and Justin Miller.

Honorable Mention

The 1995 Toronto Blue Jays

OWAR: 27.1 OWS: 208 OPW%: .469 (76-86)

AWAR: 25.4 AWS: 168 APW%: .389 (56-88)

WARdiff: 1.7 WSdiff: 40

The “Original” ’95 Jays plodded to a fourth-place finish in the AL East, eleven games behind the Orioles while the horrific “Actuals” placed 30 games behind the Red Sox. David Wells delivered a 16-8 record with a 3.24 ERA and made his first appearance at the Mid-Summer Classic. Jose Mesa (1.13, 46 SV) blossomed in the closer’s role, meriting second place in the Cy Young Award balloting along with a fourth-place finish in the MVP race. Derek Bell pilfered 27 bases and established personal-bests in BA (.334) and OBP (.385). Fellow outfielder Glenallen Hill clubbed 24 long balls and set career-highs with 86 RBI and 25 stolen bases. Geronimo Berroa clubbed 22 taters and knocked in 88 runs. Jeff Kent contributed 20 dingers and John Olerud socked 32 doubles.

On Deck

What Might Have Been – The “Original” 1902 Cubs

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

« Previous Page — « Previous entries

Next entries » — Next Page »

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG