The Giants Don’t Need an Overhaul, But an Upgrade

The Giants started off their 2016 campaign with a 57-33 record before the All-star break, before finishing 87-75. There were plenty of downfalls in the second half of the season, but ultimately the bullpen led the Giants to their fate.

In the first half of the season the combined ERA of the bullpen was 2.27, with 26 saves and a K/9 of 9.7. This being said, they had 42 save opportunities, which means they blew a save 38% of the time. In the second half of the season they combined for a 2.85 ERA, with 17 saves and a K/9 of 8.4. They blew 13 saves in 30 opportunities during the second half, which means they blew a save 43% of the time.

The bullpen was heavily criticized in the second half of the season due to the team’s inability to replicate the same win rate they saw in the first half. However, the bullpen was only slightly better in the first half then it was in the second half.

To me, the Giants were in dire need of acquiring a threat in the bullpen before the trade deadline approached. They went after Will Smith, who came in to the Giants’ pen with a 2.12 ERA, 7.9 K/9 and three blown save opportunities. With the Giants he had an ERA of 2.94, a 12.8 K/9 and a blown save. He was not able to convert a save all season, and although he proved to be a nice piece in the bullpen in hold situations, he was not a guy who could come into the 9th inning and dominate the game.

In the postseason the Giants were 0/2 in save situations and, in their final game against the Cubs, their bullpen collapse was maybe the worst the league has ever seen in the playoffs. However, their rookie Ty Blach came in for 3.2 innings of relief during the postseason and did not allow an earned run. He looked promising at the end of the regular season and pitched well in high-pressure situations during October baseball. It was surprising to see him and Santiago Casilla sit out their final game, as they watched their bullpen drop four runs in the 9th. Furthermore, we saw Clayton Kershaw close the Dodgers’ final game against the Nationals to move on to the NLCS. It would have been interesting to see what kind of performance Madison Bumgarner could have shown the Cubs’ batters in that final inning.

Finally, with the veteran relievers of Javier Lopez, Sergio Romo and Casilla needing new contracts for the 2017 campaign, and the Giants in need of finding someone who can come into a 9th inning and pose a legitimate threat, it will be interesting to see what the team does in the offseason to improve their bullpen. Here are my top five predictions for the Giants’ next closer.

 

#1:  Kenley Jansen:

It is unlikely that Aroldis Chapman will be looking for a new home this offseason, as he looks comfortable in Chicago and will have a hard time finding a team with that amount of talent. Jansen, however, may flee from the aging Dodgers, especially if someone is willing to pay. The Giants will have a bit of salary space to work with and would benefit greatly from this signing.

#2: Mark Melancon:

Although Melancon is a few steps below the elite Jansen and Chapman, he showed he can work a 9th inning as well as anyone this season. He may be a bit more team-friendly as far as salary space, and that may be intriguing to the Giants who will be looking to add a heavy-hitting left fielder.

#3: Jonathan Papelbon:

Papelbon was replaced by Melancon for the Nationals’ closing position in the second half of the 2016 season. He had a great first half, and showed he is capable of being a dominant closer in the MLB. However, his fight with Bryce Harper in 2015 and his rough second half of the season may make him a risky candidate. This may lower his cost and if the Giants are unable to sign Jansen or Melancon, they would be smart to see what Papelbon could do for their bullpen.

#4: Derek Law:

Derek Law debuted in 2016 and had a pretty good campaign. With a 2.13 ERA in 55 innings of relief, he may have a shot at being the Giants’ closer. However, it would be unlikely for him to start the 2017 season off as the Giants’ closer, unless they are unable to sign someone to fill that duty this offseason. He is an unlikely candidate, but if he can improve from his 2016 season, there is no reason he would not be able to become a legitimate MLB closer.

#5 Aroldis Chapman:

Chapman will likely return to the Cubs, especially if they make it to the World Series this October. However, he has been on three teams in the past two years, and if the Giants are able to show him more money than the Cubs, they might be able to acquire the hard-throwing lefty. If they do, they might lose the power they need to fill left field but they would come into the 2017 season looking stronger than they did a season ago.


The Non-Decline and Fall of the San Francisco Giants

The Chicago Cubs, hinting that this year they may have magick stronger than The Goat, recently brought the San Francisco Giants’ even-year playoff dominance to an end. It was an offensively offensive series; add the two teams’ OPS together and you’re just 100 points better than David Ortiz. The low-velocity Giants staff struck out a batter an inning, and both lineups walked at a lower rate than the unwalkable Royals. My working theory was that this series represented the final demise of the already waning power of the current edition of the Giants, and that the next chart-topping version of Big Head Bruce and the Monsters would have mostly new musicians. Turns out that this theory is only partially correct.

Your 2016 San Francisco Baseball Giants were actually a little better than the world-beating 2014 squad, at least when resort is had to statistics:

Stat                                            2016 (MLB rank)            2014 (MLB rank)

Position Player fWAR                   26.7 (4)                                  23.0 (9)

SP fWAR                                          15.0 (5)                                  10.1 (21)

RP fWAR                                           2.1 (22)                                  1.4 (24)

Position Player wRC+                    98 (t12)                                   99 (9)

SP FIP-                                              96 (t7)                                    104 (19)

RP FIP-                                              97 (20)                                    98 (18)

Run differential/game                    +0.51                                      +0.31

Let’s pause a minute to consider the bullpen numbers, which are the very essence of “meh” both years. The Giants have had the reputation of having a good, cheap bullpen. It’s certainly cheap: Sergio Romo is the plutocrat of the unit at a relatively unimposing $9 million. But “good” is more of a stretch; the Giants relievers have delivered value pretty much consistent with what they’ve been paid.

Some commentators have carpeted Bochy for his bullpen usage during the NLDS, but (perhaps because I’m not actually a Giants fan) I take a longer view. The miscellaneous roadies Big Head Bruce has had to work with will hardly make anyone forget The Nasty Boys, but he has often been able to squeeze value out of them when it’s mattered most. In order to maximize value out of this motley crue (I’m in town all week — try the garlic fries) Bochy has had to be very active in the late innings, and the more decisions any manager has to make, the more that will go wrong.

Giants general manager Brian Sabean has correctly recognized that in Bruce Bochy he employs one of the best tacticians in the game today. Sabean has maximized the value of this skill by handing Bochy a collection of misfit bullpen toys and saying “here, you figure this out.” On most nights Bochy does, but every once in a while he fails, as happened in the star-crossed six-pitcher 9th in Game 4. If you want to see what a bullpen meltdown looks like in graphic form, here it is. (Younger or more sensitive Giants fans are advised not to click on that link.)

My guess is that Bochy has had a few other bad bullpen nights, but most of those have happened when the East Coast was already asleep. When you happen to have a bad night nationwide, people may be a little too inclined to draw definitive conclusions. (I do not cut Buck Showalter this kind of slack. Bochy has a bunch of semi-interchangeable parts that present numerous non-obvious choices. Buck doesn’t.)

But back to our regularly scheduled program: the 2016 Giants were, by most measures, a better squad than the 2014 one. This is a roster that’s peaking, and perhaps fell victim to what will soon be a storied Cubs team, or (more prosaically) to the bad luck inherently possible in a short series. So the Giants can look forward to an extended run of playoff contention!

Or not. The Giants are heading in full sail toward the dragon-pocked part of the map. This an old team — the Giants have the sixth-oldest set of position players in the majors and the oldest pitching staff. They have just two regular players under 27, Madison Bumgarner (still just 26) and Joe Panik (25). To borrow a Casey Stengel line, in 15 years Bumgarner may be in the Hall of Fame. In 15 years, Joe Panik will be 40.

The Giants’ farm will provide little aid. Their system has just two MLB top-100 prospects, with the best being the positionless Christian Arroyo at #79 (though the excellent Bernie Pleskoff is less hostile to his defense than I am). Austin Slater isn’t in the top 100, but he raked at AAA at age 23 with good plate discipline, so he may be able to fill the outfield spot Angel Pagan is likely to vacate.

On the bright side, the contracts of Jake Peavy and Pagan expire this year, taking $26 million off the books. Romo and Santiago Casilla will be departing for broadcasting careers as well, taking $15 million more of liabilities with them. The Giants need one or two outfielders and starting pitching, but especially with respect to the latter, next year’s free-agent class would make a cow laugh. The 2018 list is a better one, but between now and both free-agent classes likely interposes a new collective bargaining agreement, so there’s enough fog to compel Sabean to operate his lights on low beam.

And the competition isn’t sitting still. Regardless of how the hated Los Angeles Dodgers fare in the NLCS, they are poised to compete for a while. The Rockies have an exciting core of young talent, even if casual Rox fans despair of the team at the moment. The Outlaw A.J. Preller merits a blog post all his own (say, there’s an idea!), and while the Padres seem to have a bit of transmission loss between talent and wins, some improvement there is possible as well, especially if Tyson Ross can make a successful return from thoracic outlet surgery. (What? You say there’s another team in the NL West? Hmm … I’ll research that and get back to you.)

So the Giants may be stalling or even slipping backward in a division where at least two of the teams are making progress. The Giants have a good but mostly older core which could use the kind of help that free agency and prospect trades are unlikely to provide in 2017. So 2016 may indeed be the last gasp of this once-in-a-while mighty franchise, at least for the moment. Sabean has pulled a whole warren of rabbits out of his hat during his long tenure, but in 2017 he’s going to have to dig deep.

Perhaps there will be a powerful goat looking for work …


Dr. Hendricks and Mr. Gray

Randomness and circumstances are important driving forces in everything that happens in the world. Although they usually work hand in hand with our own actions and decisions, they have the ability to pick you up when you hit the jackpot at the casino, or throw you down when your car gets crushed by a falling tree (hopefully you’re comfortably sleeping in your bed when that happens).  They can also be the difference between a pitcher having an average season on the mound, and having an outstanding one. Such is the case with the seasons Jon Gray and Kyle Hendricks had this year.

I’m not going to make the argument that these two pitchers performed equally well this season, with the main differences being random chance and circumstances, because they didn’t. Hendricks was the better pitcher; it just wasn’t the 2.48-run difference their ERAs show. The similarities between the two performances can be summarized in basically two stats. If we take a look at xFIP and SIERA (two important ERA estimators available here at FanGraphs), Hendricks’ numbers of 3.59 and 3.70, respectively, are eerily similar to Gray’s 3.61 and 3.72. From there on, however, the numbers separate abruptly.

Much like Dr. Jekyll and Mr. Hyde represent the good and the bad within a person, Hendricks’ and Gray’s seasons represent two sides of the same coin. On the one hand, circumstantial factors and good fortune turned Hendricks’ very good performance into a historical season, while a different set of circumstances and some bad fortune turned Gray’s good performance into merely an average one. In this piece, we’ll take a look at the factors that influenced these diametrically opposed results.

I’ll start by saying that Kyle Hendricks had a remarkable and impressive season. He had an average strikeout rate (8.05 K/9), didn’t walk many batters (2.08 BB/9), and allowed very few longballs (0.71 HR/9), which resulted in a really good 3.20 FIP, which ranked 4th in the majors. His ERA, however, ended up all the way down to 2.13; a whopping 1.07 runs less than his FIP. Despite being a big difference, it’s not all that uncommon, as nearly 2% of individual seasons by starters in the history of the game have had an E-F (ERA minus FIP) of -1.07 or lower. Nonetheless, that difference is hardly sustainable through multiple seasons. In major-league history, out of 2259 pitchers with at least 500 innings pitched, only two had a career E-F below -1.00, and both of them were full-time relievers (in case you’re curious, they are Alan Mills and Al Levine).

On the other side of the spectrum, Jon Gray also had a very solid season. He had an outstanding 9.91 strikeouts per 9 innings (that ranked him 9th among qualifying starters), an average walk rate of 3.16 BB/9, and a solid home-run rate (0.94 HR/9), lower than league average despite pitching half of his innings at Coors Field. His performance was good enough for a 3.60 FIP, but his actual ERA rocketed to 4.61. This 1.01 positive difference is just as unusual as Hendricks’ negative one, as about 2% of individual seasons throughout history have resulted in differences of 1.01 or higher. For visualizing purposes, here’s a table summarizing both pitchers’ numbers.

Picture

So the question still remains: what were the determining factors in these two pitchers having such a massive difference in results? Let’s dive right into it.

First of all, I decided to look at the correlation factors between E-F and a wide array of pitching stats, using data from every pitcher in MLB history with 500+ innings. As a general rule of thumb, a correlation factor between 0.40 and 0.69 indicates a strong relationship between the two variables. The following table shows the stats that had at least a 0.40 correlation factor with E-F:

Picture

Welp, that’s a pretty lame table. Keep in mind, I analyzed correlations for stats as varied as pitch-type percentages, pitch-type vertical and horizontal movements, and Soft, Medium, and Hard-hit rates, as well as K, BB, and HR per 9, or HR/FB%. None of those had even a moderate relationship with E-F. So let’s stick with the stats presented on the table.

The first two stats are really no surprise. FIP basically assumes league-average BABIP and LOB% to estimate what a pitcher’s ERA should look like. So, if a pitcher has a high BABIP, FIP is going to estimate a lower ERA than the actual one, resulting in a higher E-F; thus the positive correlation. On the other hand, if a pitcher has a higher LOB%, he’ll allow fewer runs than his FIP would suggest, resulting in a lower E-F. This explains the negative correlation shown in the table. The last stat, however, came as a real surprise, at least for me. ERA seems to be positively correlated with E-F, which means that pitchers with higher ERA tend to have higher E-F than pitchers with lower ERA.

The next logical step would be to determine which factors, if any, explain BABIP and/or LOB% among pitchers. Using the same pitching stats than in the previous step, I ran correlations with BABIP and LOB% separately. The following table shows the stats that had a strong (0.40 to 0.69) or moderate (0.30 to 0.39) relationship.

Picture

As was the case in the first table, both of these stats are correlated strongly with E-F, showing factors of 0.58 and -0.42, respectively. It doesn’t come as a shock either, that they are strongly correlated with each other. The negative correlating factor (-0.42) indicates, as you would expect, that a high BABIP leads to a low LOB%, and vice versa. On the BABIP side, a positive strong relationship with ERA is almost too obvious, as more balls in play falling for hits leads to more runs being scored. Also, since fly balls in play (not counting home runs) turn more often into outs than ground balls do, it makes sense that BABIP holds a negative relationship with the former, and a positive one with the latter. This fact, however, goes against a somewhat popular belief that ground-ball pitchers tend to have lower BABIPs.

The factors that correlate to LOB% are more interesting. The first one is not unexpected: a higher strikeout rate seems to lead to more runners getting stranded, and that’s a pretty easy concept to wrap your head around. The second one, however, is really mind-boggling, and I really can’t say I can find a reasonable explanation for it. It indicates that the higher the home-run rate allowed by a pitcher, the more runners are going to be left on base. It is quite possible that this is just a spurious correlation, having no causality at all. Finally, the last factor listed on the table is very interesting and useful in this particular case. It suggests that high percentages of soft contact lead to higher LOB%. We’ll get to that later on in this article.

So let’s go back to our pitchers and check if any of this makes sense. We know that E-F is mainly affected by BABIP and LOB%. Hendricks and Gray had very different numbers in these two stats. The Cubs’ righty had a .250 BABIP and a LOB% of 81.5, while the Rockies’ fireballer had .308 and 66.4%. Considering that the league averages were .298 and 72.9%, respectively, we can say that Hendricks did considerably better than average, while Gray did just the opposite. So far so good, right? These facts go a long way towards explaining the differing outcomes. However, BABIP and LOB% aren’t exactly pitcher-dependent; in fact, they’re the marquee stats for the generic term “luck.”

Looking at the stats from the second table, few of them help out in figuring this out. High strikeout rates, for example, are supposed to increase LOB%, but Gray still managed a really low 66.4% despite a 9.91 K/9. On the other hand, Hendricks’ 81.5% LOB ranked 5th among qualified starters, even though his strikeout rate of 8.05 was right around league average. Similarly, groundball percentage is shown to have a positive correlation with BABIP. Nonetheless, Hendricks’ higher-than-average rate of 48.4% (league average was 44.7%) resulted in a ridiculously low BABIP of .250, while Gray’s below-average rate of 43.5% came with a .308 BABIP. Almost the same thing happens when you look at the fly-ball rates.

The only factor from that second table that does make sense in these particular examples is soft-contact rate. Hendricks ranked 1st in this regard among qualified starters, with an impressive 25.1% (league average was 18.8%), while Gray had a below-average rate of 17.8%, which ranked him 50th out of 73 qualified starters. This stat is very much pitcher-dependent, and it does help explain some of the differences in LOB%. It has, however, a moderate relationship with LOB%, as evidenced by its factor of -0.37. Is that enough to account for the massive difference in the results? Intuitively, I’ll say no. There is one more factor, however, that we haven’t even discussed yet.

FIP stands for Fielding Independent Pitching, so the very thing that FIP is trying to subtract from the equation might hold the key to answering our question. Defensive performances can heavily influence the outcome of the game, and make up a big chunk of what we generally call “luck” in a pitcher’s final results. In order to have a numerical confirmation of this idea, I looked at the correlations between teams’ yearly defensive component of WAR and its staff’s BABIP, LOB%, and E-F. The data I used for this exercise was every individual team season from 1989 (the first year in which play-by-play data contained information on hits and outs location) to 2016.

Picture

We can see here that a team’s defense has a strong correlation with all three of the stats, especially E-F. Higher values of the defensive component of WAR lead to lower BABIP, higher LOB%, and lower E-F, just as you would expect.

Saying that the Cubs had a great defensive performance this year is an understatement. Not only was it the best defense in 2016 by a bunch — it was also the best defense of the last 17 years, according to FanGraphs’ defensive component of WAR. Of the 814 individual team seasons played in MLB since 1989, this year’s Cubs rank 8th. That’ll put a serious dent on opponents’ BABIP. In fact, the Cubs’ average on balls in play of .255 (yes, that is the whole pitching staff’s BABIP) is the absolute lowest since the ’82 Padres. Oh, and also the Cubs pitching staff’s LOB% of 77.5% is tied for 2nd highest since 1989. All of this adds up to a team E-F of -0.62. Wow. Just wow.

The Rockies defense, on the other hand, wasn’t bad, but it also wasn’t great. According to FanGraphs, it was 17.9 runs above average, which ranked 12th in MLB. Again, that’s really not bad at all, just miles away from the 115.5 runs above average the Cubs had. The Rockies’ staff as a whole had a .317 BABIP, and a 68.0% LOB%; not unexpected from a team that plays half their games at altitude. Still, both of these values are worse than league average, resulting in a team E-F of 0.54.

All in all, Kyle Hendricks still had a better season than Jon Gray, and people will remember the 2.13 ERA and not the 4.61. This analysis just puts it a little bit more in perspective, and helps shed some light on the little details that make big differences in the course of a long season.

The old football adage says that “defense wins championships.” That doesn’t really apply to baseball, but in the future, when I think back to the 2016 Cubs, I’ll definitely think about their defense.


2016 ALCS Game One: Batter vs. Pitcher Stats

The FanGraphs Twitter page tweeted out a bingo card for Game One of the ALCS. As I looked through it, I thought it was a terrific idea by Michelle Jay and a fun way to follow the game that night. I was going to play along, but then I had another idea. Some slots were much more likely to happen, such as the “Pitcher v hitter stats are mentioned” slot. I figured I would let somebody else receive a t-shirt and just count up exactly how many times the TBS broadcast team mentioned batter vs. pitcher stats. We all know announcers love doing this, and we all know that it’s pretty useless for predicting the outcome of that particular at-bat. I just thought it would be cool to experiment and see how many times they actually mentioned these stats.

First, I’ll just go over the final numbers for batter vs. pitcher stats. There were 65 batters in this game, and batter vs. pitcher stats were either mentioned by the announcers or shown on a graphic for eight of those batters.  There were two separate times where they showed a graphic and then mentioned the stats later in the plate appearance, or vice versa. Four of the eight instances occurred when the Jays were hitting against Corey Kluber, three of the eight came when Andrew Miller was pitching, and the last one came when Marco Estrada was on the mound. It’s interesting that they would mention those stats more often when a reliever is pitching, considering the sample size is sure to be even smaller against relievers, rather than starters.

For fun, I marked each occurrence and tried to quickly type out how the announcer mentioned these stats:

  1. Top 1, Josh Donaldson vs. Corey Kluber: “He’s got some pretty good numbers, 6 for 16 with a jack, so he sees him well” -Cal Ripken
  2. Top 1, Russell Martin vs. Corey Kluber: “Martin is only 2 for 10 in his career against Kluber, both home runs…in fact, two of his last seven off Kluber have been home runs” -Ernie Johnson (graphic added later in the plate appearance reading “2 for last 7 off Kluber with 2 HR”
  3. Top 2, Michael Saunders vs. Corey Kluber: “Saunders steps in, he’s 3 for 8 in his career against Kluber, and he fouls it off” -Ernie Johnson
  4. Top 6, Michael Saunders vs. Corey Kluber: “Saunders with his two hits, now 5 for 10 off Kluber” -Ron Darling
  5. Bottom 6, Jason Kipnis vs. Marco Estrada: graphic shown reading “0 for 7 4 K VS ESTRADA”
  6. Top 7, Melvin Upton Jr. vs. Andrew Miller: “Upton’s got some numbers against Miller, 5 for 12 with three home runs” -Ron Darling (“That is some numbers” -Cal Ripken)
  7. Top 8, Edwin Encarnacion vs. Andrew Miller: “Encarnacion in his last six at-bats against Miller a couple of home runs and a double” -Ernie Johnson
  8. Top 8, Jose Bautista vs. Andrew Miller: graphic shown reading “.286 (2 for 7) 1 HR 2 BB VS MILLER” (later in the plate appearance: “One of the two hits that Bautista has off Miller…long ball” -Ron Darling

I’m not trying to knock these announcers by saying that they’re not good at what they do or anything. I would be a terrible announcer. I just think these stats are pretty useless and it was interesting to see how many times they actually mentioned them during a game. Mike Petriello pointed out on Twitter an example of why these numbers aren’t good to look at.

This would be kind of fun to track during the regular season for the really good ones, such as “so and so: 1 for 2 (.500), single career vs. so and so.” Maybe this can be a new metric or something, bpBAAR (batter pitcher Baseball Announcer Above Replacement).


Clustering Pitchers With PITCHf/x

At any point, feel free to scroll down to the bottom to see some of the tables of pitcher clusters.

Clustering Pitches

Clustering individual pitches using data from PITCHf/x is a fairly simple task. All you need to do is pick out the important attributes that you believe define a pitch (velocity, movement, etc.) and use a clustering algorithm, such as K-Means clustering.

With K-Means clustering, you decide what K (the number of clusters) should be. For my analysis, I chose K to be 500 (rather arbitrarily). Different pitch clusters can represent the same type of pitch (i.e. fastball) but with varying attributes. For example, clusters 50 and 100 might both correspond to fastballs, but cluster 50 might be a typical Chris Young fastball whereas cluster 100 might be a typical Aroldis Chapman fastball.

One important point to remember is that you, the analyst, must decide what the clusters represent. By looking at attributes of the pitches in a given cluster, you might identity the cluster as “lefty changeups” or “submariner fastballs” (which is actually a category you will discover).

The Problem of Clustering Pitchers

We can identify every pitch that a pitcher throws as belonging to a cluster from 1 to 500. Therefore, we know the distribution of pitch clusters for a given pitcher. The difficult problem, however, is how do we compare two pitchers using this information? Let’s say we have two pitchers:

  • Pitcher A’s pitches are 50% from cluster 1 and 50% from cluster 200.
  • Pitcher B’s pitches are 33% from cluster 1, 33% from cluster 300, and 33% from cluster 139.

The question remains, are Pitcher A and Pitcher B similar pitchers?

The problem of clustering pitchers is a more complicated one than clustering pitches because we now have a collection of pitches instead of just individual pitches to compare. In order to cluster pitchers, I use a model that is typically used for topic modeling called Latent Dirichlet Allocation (LDA).

An Aside on LDA

In LDA for topic modeling, our data is a collection of documents.

Let’s imagine that our collection of documents is articles from the New York Times. There are global topics that govern how these articles are generated. For example, if you think of a newspaper, the topics might be sports, finance, health, politics, etc. Additionally, each article can be a mixture of these topics. We might imagine there is an article in the sports section titled, “Yankees payroll exceeds $300 million”, which our algorithm may discover is 50% about sports and 50% about finance.

Similar to what is mentioned above, the analyst must figure out what the topics actually are. You do not tell the algorithm that there is a sports topic. You discover that the topic is sports by observing that the most probable words are “baseball”, “Jeter”, “LeBron”, “touchdown”, etc. The algorithm will tell you that a particular document is 50% about topic 1 and 50% about topic 20, but you must ultimately infer what topics 1 and topics 20 are.

I am harping on this point mainly just to mention that there is no magic to these clustering algorithms. An algorithm can cluster data, but it cannot tell you what these clusters mean.

Relevance of LDA to Pitchers

Anyway, how can this model be used to analyze pitchers? We just need to use our imagination. Instead of a collection of documents, we now have a collection of pitcher seasons. Whereas each document is made up of a collection of words, each pitcher season is made up of a collection of pitches. We have already discretized each pitch using K-Means clustering in order to create our own “dictionary” of pitches. In our baseball model, we imagine that each pitcher is a mixture of repertoires, whereas in topic modeling, each document was a mixture of topics. We can then cluster pitchers together by figuring out who has the most similar repertoires.

Nitty Gritty Details

If you are not interested in getting into the nitty gritty details, feel free to skip ahead to the next section to just see the cluster groupings.

  • Data used is from 2007-2014.
  • The dictionary of pitches (500 clusters) was created by running K-Means using all of the pitches from 2014. The choice of 2014 is arbitrary, but I used just one year’s worth of data because I thought it might be a sufficient amount and it was much quicker to run K-Means.
  • The PITCHf/x attributes that were used to cluster pitches were start_speed, pfx_x/pfx_z (horizontal/vertical movement), px/pz (horizontal/vertical location), vx0/vz0 (components of velocity).
  • For each pitcher from 2007-2014, each pitch was assigned to its closest cluster (determined by distance to the cluster center). I filtered out pitcher seasons in which the pitcher threw fewer than 500 pitches.
  • I then ran LDA on pitcher seasons, choosing the number of repertoires (topics) to be 5.
  • I used the method from this paper to get a vector representation of each pitcher season. I could have used the inferred repertoire proportions as my vector representations, but for various reasons, this did not produce as nice of clusters.
  • Finally, I ran K-Means (K=100) on these vectors to get clusters of pitchers.
  • Whereas in topic modeling, it is often interesting to interpret what the global topics actually are, I am not really interested in what the global “repertoires” are for the model. I am really using LDA as a dimensionality reduction technique to produce smaller vectors (5 vs. 500) that can be clustered together.

Some Observations

The actual clusters along with some relevant FanGraphs statistics are provided below. Each table is sortable. For brevity, I have only included clusters in which there are 10 or fewer pitchers. Only the first cluster shown (cluster 3) has more than 10 pitchers, which I simply included to demonstrate that a cluster could be quite big.

  • As is probably expected, clusters are almost always entirely righties or lefties even though this is not an input to the model.
  • Guys with similar numbers of batters faced cluster together. This is by design, as the way I determined the repertoire proportions accounts for the number of times a particular pitch is thrown.
  • Sometimes weird clusters can form, such as Cluster 37, which contains both Chapman and Wakefield. Cluster 37 is mostly cohesive with hard-throwing left-handers and I believe Wakefield ends up here simply because he did not fit well into any cluster.
  • This is not to say that the algorithm cannot find clusters of knuckleballers. Cluster 14 is all R.A. Dickey from years 2011-2014.
  • There are also other clusters that contain exclusively one (or almost one) pitcher. Cluster 8 is 5 Kershaw years and one Hamels year. Cluster 68 is 5 Verlander years. I believe these clusters form partially because their stuff is so good. There are other pitchers who fall into almost exclusively one cluster but who are joined by many other pitchers. Another factor is that they might be able to repeat their mechanics so well that they remain in the same cluster because they are always throwing the same pitch types.
  • Clusters of individual pitchers also happens if a pitcher has an incredibly unique style. Justin Masterson has his own cluster because he is such an extreme ground-ball pitcher. Josh Collmenter does as well due to the extreme rise he generates on his “fastball”.
  • Cluster 29 contains just Kershaw’s 2014 season and J.A. Happ’s 2009 season. If you do a Ctrl-F for J.A. Happ, he finds himself in some pretty flattering clusters. This is especially interesting because from 2007-2014, he does not have particularly good seasons, but he has been quite good the last two years. This is not to suggest that these clusters can uncover hidden gems, but it’s not fully out of the realm of possibility.
  • Most clusters produce quite similar ground-ball percentages. One of the factors that goes into clustering pitches (and therefore pitchers) is horizontal and vertical movement, which play a huge factor in a pitcher’s ability to produce ground-balls.
  • Submarine pitchers always end up together. Check out Clusters 9, 60, and 92.

Overall, I think this is pretty interesting stuff. I was honestly surprised that the clusters turned out to be as cohesive as they were. Additionally, besides being a descriptive tool, I have to wonder whether this information can be used for predictive purposes. For example, we often talk about regression to the mean when discussing a player’s performance, whether it be a pitcher of a batter. It is possible that the appropriate mean for many pitchers is the cluster mean that they happen to fall into.

Cluster 3

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Chris Carpenter Cardinals 750 6.73 1.78 0.33 55.0 28.0 4.6 5.5
2010 Hiroki Kuroda Dodgers 810 7.29 2.20 0.69 51.1 32.1 8.0 4.3
2010 Gavin Floyd White Sox 798 7.25 2.79 0.67 49.9 32.1 7.6 4.1
2008 Hiroki Kuroda Dodgers 776 5.69 2.06 0.64 51.3 28.6 7.6 3.6
2012 Doug Fister Tigers 673 7.63 2.06 0.84 51.0 26.7 11.6 3.4
2011 Josh Beckett Red Sox 767 8.16 2.42 0.98 40.1 42.2 9.6 3.3
2011 Michael Pineda Mariners 696 9.11 2.89 0.95 36.3 44.8 9.0 3.2
2012 A.J. Burnett Pirates 851 8.01 2.76 0.80 56.9 24.3 12.7 3.0
2013 Rick Porcello Tigers 736 7.22 2.14 0.92 55.3 23.7 14.1 2.9
2008 Carlos Zambrano Cubs 796 6.20 3.43 0.86 47.2 34.9 9.0 2.8
2013 Andrew Cashner Padres 707 6.58 2.42 0.62 52.5 28.7 8.1 2.7
2012 Jeff Samardzija Cubs 723 9.27 2.89 1.03 44.6 33.1 12.8 2.7
2010 Scott Baker Twins 725 7.82 2.27 1.22 35.6 43.5 10.2 2.6
2014 Kyle Gibson Twins 757 5.37 2.86 0.60 54.4 26.6 7.8 2.3
2012 Tim Hudson Braves 749 5.13 2.41 0.60 55.5 25.2 8.3 2.1
2014 Henderson Alvarez Marlins 772 5.34 1.59 0.67 53.8 24.3 9.5 2.1
2008 Todd Wellemeyer Cardinals 807 6.29 2.91 1.17 39.3 39.8 10.6 2.0
2010 Rick Porcello Tigers 700 4.65 2.10 1.00 50.3 32.1 9.9 1.7
2011 Luke Hochevar Royals 835 5.82 2.82 1.05 49.8 32.2 11.5 1.7
2008 Jason Marquis Cubs 738 4.90 3.77 0.81 47.6 32.5 8.3 1.7
2014 Charlie Morton Pirates 666 7.21 3.26 0.51 55.7 22.8 8.8 1.6
2012 Luis Mendoza Royals 709 5.64 3.20 0.81 52.1 27.1 10.6 1.5
2009 Aaron Cook Rockies 675 4.44 2.68 1.08 56.5 24.7 14.2 1.4
2014 Doug Fister Nationals 662 5.38 1.32 0.99 48.9 34.2 10.1 1.4
2010 Mitch Talbot Indians 696 4.97 3.90 0.73 47.8 35.3 7.0 1.2
2008 Armando Galarraga Tigers 746 6.35 3.07 1.41 43.5 39.7 13.0 1.2
2008 Carlos Silva Mariners 689 4.05 1.88 1.17 44.0 33.3 10.4 1.2
2009 Ross Ohlendorf Pirates 725 5.55 2.70 1.27 40.6 42.1 11.1 1.2
2008 Vicente Padilla Rangers 757 6.68 3.42 1.37 42.7 38.1 12.5 1.1
2012 Luke Hochevar Royals 800 6.99 2.96 1.31 43.3 35.0 13.5 1.1
2012 Derek Lowe – – – 640 3.47 3.22 0.63 59.2 21.0 9.1 1.0
2013 Edinson Volquez – – – 777 7.50 4.07 1.00 47.6 29.6 11.9 0.9
2011 Chris Volstad Marlins 719 6.36 2.66 1.25 52.3 27.7 15.5 0.7
2010 Jeremy Bonderman Tigers 754 5.89 3.16 1.32 44.7 39.2 11.4 0.7
2010 Brad Bergesen Orioles 746 4.29 2.70 1.38 48.7 36.6 11.9 0.6
2014 Hector Noesi – – – 733 6.42 2.92 1.46 38.0 40.6 12.7 0.3
2009 Armando Galarraga Tigers 642 5.95 4.20 1.50 39.9 38.6 13.3 0.2
2008 Kyle Kendrick Phillies 722 3.93 3.30 1.33 44.3 28.7 14.0 0.1
2014 Roberto Hernandez – – – 722 5.74 3.99 1.04 49.7 29.9 12.2 0.0
2013 Lucas Harrell Astros 707 5.21 5.15 1.17 51.5 27.4 14.3 -0.8

 

Cluster 5

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Cliff Lee – – – 843 7.84 0.76 0.68 41.9 40.4 6.3 7.0
2011 Cliff Lee Phillies 920 9.21 1.62 0.70 46.3 32.4 9.0 6.8
2009 Jon Lester Red Sox 843 9.96 2.83 0.89 47.7 34.5 10.6 5.3
2014 Jose Quintana White Sox 830 8.00 2.34 0.45 44.7 33.2 5.1 5.1
2013 Derek Holland Rangers 894 7.99 2.70 0.85 40.8 36.4 8.8 4.3
2012 Matt Moore Rays 759 8.88 4.11 0.91 37.4 42.9 8.6 2.7
2013 Wade Miley Diamondbacks 847 6.53 2.93 0.93 52.0 27.2 12.5 1.8

 

Cluster 6

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 CC Sabathia Indians 975 7.80 1.38 0.75 45.0 36.6 7.8 6.4
2014 Jake McGee Rays 274 11.36 2.02 0.25 38.0 42.9 2.9 2.6
2014 Tyler Matzek Rockies 503 6.96 3.37 0.69 49.7 30.3 8.3 1.7
2013 J.A. Happ Blue Jays 415 7.48 4.37 0.97 36.5 46.0 7.6 1.1
2010 J.A. Happ – – – 374 7.21 4.84 0.82 39.0 43.4 7.4 1.0
2009 Sean West Marlins 467 6.10 3.83 0.96 40.2 40.8 8.0 1.0
2009 Andrew Miller Marlins 366 6.64 4.84 0.79 48.0 30.0 9.3 0.7
2012 Drew Pomeranz Rockies 434 7.73 4.28 1.30 43.9 35.9 13.6 0.7
2013 Jake McGee Rays 260 10.77 3.16 1.15 42.5 38.8 12.9 0.6
2008 Jo-Jo Reyes Braves 512 6.21 4.14 1.43 48.5 31.8 15.5 0.2

 

Cluster 8

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Clayton Kershaw Dodgers 908 8.85 1.98 0.42 46.0 31.3 5.8 7.1
2011 Clayton Kershaw Dodgers 912 9.57 2.08 0.58 43.2 38.6 6.7 7.1
2012 Clayton Kershaw Dodgers 901 9.05 2.49 0.63 46.9 34.0 8.1 5.9
2010 Clayton Kershaw Dodgers 848 9.34 3.57 0.57 40.1 42.1 5.8 4.7
2009 Clayton Kershaw Dodgers 701 9.74 4.79 0.37 39.4 41.6 4.1 4.4
2010 Cole Hamels Phillies 856 9.10 2.63 1.12 45.4 37.9 12.3 3.5

 

Cluster 9

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Peter Moylan Braves 309 7.52 4.32 0.00 62.4 19.5 0.0 1.4
2014 Joe Smith Angels 285 8.20 1.81 0.48 59.1 25.9 8.0 1.0
2011 Joe Smith Indians 267 6.04 2.82 0.13 56.6 23.5 2.2 1.0
2009 Brad Ziegler Athletics 313 6.63 3.44 0.25 62.3 19.7 4.4 1.0
2013 Brad Ziegler Diamondbacks 297 5.42 2.71 0.37 70.4 10.8 12.5 0.6
2012 Brad Ziegler Diamondbacks 263 5.50 2.75 0.26 75.5 7.7 13.3 0.6
2012 Joe Smith Indians 278 7.12 3.36 0.54 58.0 24.9 8.3 0.6
2008 Cla Meredith Padres 302 6.27 3.07 0.77 66.8 17.3 15.8 0.3
2010 Peter Moylan Braves 271 7.35 5.23 0.71 67.8 21.3 13.5 -0.3

 

Cluster 14

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 R.A. Dickey Mets 927 8.86 2.08 0.92 46.1 34.1 11.3 5.0
2011 R.A. Dickey Mets 876 5.78 2.33 0.78 50.8 32.9 8.3 2.5
2014 R.A. Dickey Blue Jays 914 7.22 3.09 1.09 42.0 37.6 10.7 1.7
2013 R.A. Dickey Blue Jays 943 7.09 2.84 1.40 40.3 40.5 12.7 1.7

 

Cluster 16

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Max Scherzer Tigers 836 10.08 2.35 0.76 36.3 44.6 7.6 6.1
2014 Max Scherzer Tigers 904 10.29 2.57 0.74 36.7 41.6 7.5 5.2
2011 Daniel Hudson Diamondbacks 921 6.85 2.03 0.69 41.7 39.1 6.4 4.6
2012 Max Scherzer Tigers 787 11.08 2.88 1.10 36.5 41.5 11.6 4.4
2014 Jeff Samardzija – – – 879 8.28 1.76 0.82 50.2 30.5 10.6 4.1
2014 Lance Lynn Cardinals 866 8.00 3.18 0.57 44.3 36.0 6.1 3.4

 

Cluster 18

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Brandon Webb Diamondbacks 944 7.27 2.58 0.52 64.4 20.4 9.6 5.5
2013 Justin Masterson Indians 803 9.09 3.54 0.61 58.0 24.2 10.7 3.5
2012 Justin Masterson Indians 906 6.94 3.84 0.79 55.7 25.0 11.4 2.3
2011 Derek Lowe Braves 830 6.59 3.37 0.67 59.0 22.5 10.2 2.1

 

Cluster 20

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 John Danks White Sox 878 6.85 2.96 0.76 45.4 38.9 7.4 4.4
2010 Brian Matusz Orioles 760 7.33 3.23 0.97 36.2 45.0 7.9 3.0
2009 John Danks White Sox 839 6.69 3.28 1.26 44.2 40.9 11.5 2.7
2013 Felix Doubront Red Sox 705 7.71 3.94 0.72 45.6 34.4 7.8 2.2
2014 J.A. Happ Blue Jays 673 7.58 2.91 1.25 40.6 39.5 11.5 1.0

 

Cluster 24

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 CC Sabathia – – – 1023 8.93 2.10 0.68 46.6 31.7 8.8 7.3
2011 CC Sabathia Yankees 985 8.72 2.31 0.64 46.6 30.3 8.4 6.4
2010 David Price Rays 861 8.11 3.41 0.65 43.7 39.6 6.5 4.2

 

Cluster 29

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Clayton Kershaw Dodgers 749 10.85 1.41 0.41 51.8 29.2 6.6 7.6
2009 J.A. Happ Phillies 685 6.45 3.04 1.08 38.4 42.9 9.5 1.7

 

Cluster 35

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Chris Young Mariners 688 5.89 3.27 1.42 22.3 58.7 8.8 0.1
2014 Marco Estrada Brewers 624 7.59 2.63 1.73 32.7 49.5 13.2 -0.1

 

Cluster 36

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Justin Masterson Indians 908 6.58 2.71 0.46 55.1 26.7 6.3 4.2
2010 Justin Masterson Indians 802 7.00 3.65 0.70 59.9 24.9 10.0 2.3

 

Cluster 37

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Aroldis Chapman Reds 276 15.32 2.89 0.50 37.3 42.9 7.4 3.3
2009 Matt Thornton White Sox 291 10.82 2.49 0.62 46.4 36.3 7.7 2.3
2008 Matt Thornton White Sox 268 10.29 2.54 0.67 53.0 27.4 10.9 1.7
2012 Drew Smyly Tigers 416 8.52 2.99 1.09 39.9 41.3 10.3 1.7
2008 Clayton Kershaw Dodgers 470 8.36 4.35 0.92 48.0 31.3 11.6 1.5
2008 Tim Wakefield Red Sox 754 5.82 2.98 1.24 35.5 48.9 9.1 1.1
2011 Tim Wakefield Red Sox 677 5.41 2.73 1.45 38.4 45.8 10.5 0.2

 

Cluster 38

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Cliff Lee Phillies 876 8.97 1.29 0.89 44.3 33.3 10.9 5.5
2008 Johan Santana Mets 964 7.91 2.42 0.88 41.2 36.4 9.4 5.3
2010 Jon Lester Red Sox 861 9.74 3.59 0.61 53.6 29.6 8.9 4.8
2012 CC Sabathia Yankees 833 8.87 1.98 0.99 48.2 30.7 12.5 4.7
2008 Jon Lester Red Sox 874 6.50 2.82 0.60 47.5 31.6 7.0 4.1
2013 Hyun-Jin Ryu Dodgers 783 7.22 2.30 0.70 50.6 30.5 8.7 3.6
2014 Wei-Yin Chen Orioles 772 6.59 1.70 1.11 41.0 37.5 10.5 2.4
2010 Jonathan Sanchez Giants 812 9.54 4.47 0.98 41.5 43.7 9.8 2.3
2014 Wade Miley Diamondbacks 866 8.18 3.35 1.03 51.1 28.0 13.9 1.6

 

Cluster 44

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Cole Hamels Phillies 850 8.08 1.83 0.79 52.3 32.6 9.9 4.9
2008 Cole Hamels Phillies 914 7.76 2.10 1.11 39.5 38.7 11.2 4.8
2008 John Danks White Sox 804 7.34 2.63 0.69 42.8 35.4 7.4 4.8
2009 Cole Hamels Phillies 814 7.81 2.00 1.12 40.4 38.7 10.7 3.9
2014 Danny Duffy Royals 606 6.81 3.19 0.72 35.8 46.0 6.1 1.9
2011 J.A. Happ Astros 698 7.71 4.78 1.21 33.0 44.2 10.2 0.6

 

Cluster 46

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Roy Halladay Phillies 993 7.86 1.08 0.86 51.2 29.7 11.3 6.1
2013 Lance Lynn Cardinals 856 8.84 3.39 0.62 43.1 34.4 7.4 3.7
2008 Mike Pelfrey Mets 851 4.93 2.87 0.54 49.6 29.6 6.3 3.1
2009 A.J. Burnett Yankees 896 8.48 4.22 1.09 42.8 39.2 10.8 3.0
2010 Roberto Hernandez Indians 880 5.31 3.08 0.73 55.6 30.8 8.3 2.6
2009 Derek Lowe Braves 855 5.13 2.91 0.74 56.3 25.8 9.4 2.5
2010 Derek Lowe Braves 824 6.32 2.83 0.84 58.8 22.6 13.1 2.2

 

Cluster 49

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Aroldis Chapman Reds 202 17.67 4.00 0.17 43.5 34.8 4.2 2.8
2014 James Paxton Mariners 303 7.18 3.53 0.36 54.8 22.6 6.4 1.2
2013 Rex Brothers Rockies 281 10.16 4.81 0.67 48.8 32.5 9.3 0.9
2012 Antonio Bastardo Phillies 224 14.02 4.50 1.21 27.7 50.0 12.5 0.8
2012 Tim Collins Royals 295 12.01 4.39 1.03 40.9 42.8 11.8 0.7
2012 Christian Friedrich Rockies 377 7.87 3.19 1.49 42.2 34.6 15.4 0.7
2013 Justin Wilson Pirates 295 7.21 3.42 0.49 53.0 30.0 6.7 0.6
2011 Aroldis Chapman Reds 207 12.78 7.38 0.36 52.7 30.8 7.1 0.5
2014 Justin Wilson Pirates 256 9.15 4.50 0.60 51.3 34.4 7.3 0.2
2011 Mike Dunn Marlins 267 9.71 4.43 1.29 38.5 46.0 12.2 -0.2

 

Cluster 51

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Cliff Lee – – – 969 7.03 1.67 0.66 41.3 36.5 6.5 6.3
2009 CC Sabathia Yankees 938 7.71 2.62 0.70 42.9 37.3 7.4 5.9
2010 CC Sabathia Yankees 970 7.46 2.80 0.76 50.7 34.1 8.6 5.1

 

Cluster 54

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Hisashi Iwakuma Mariners 709 7.74 1.06 1.01 50.2 28.7 13.2 3.1
2009 Justin Masterson – – – 568 8.28 4.18 0.84 53.6 31.4 10.4 1.5
2014 Justin Masterson – – – 592 8.11 4.83 0.84 58.2 21.6 14.6 0.4

 

Cluster 58

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 David Price – – – 1009 9.82 1.38 0.91 41.2 38.1 9.7 6.0
2014 Jon Lester – – – 885 9.01 1.97 0.66 42.4 37.0 7.2 5.6
2012 Gio Gonzalez Nationals 822 9.35 3.43 0.41 48.2 30.0 5.8 5.0
2011 David Price Rays 918 8.75 2.53 0.88 44.3 36.9 9.7 4.4
2013 Gio Gonzalez Nationals 819 8.83 3.50 0.78 43.9 33.3 9.7 3.2
2011 Gio Gonzalez Athletics 864 8.78 4.05 0.76 47.5 34.1 8.9 3.1
2010 Gio Gonzalez Athletics 851 7.67 4.13 0.67 49.3 35.3 7.4 3.1

 

Cluster 60

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Brad Ziegler – – – 239 6.79 2.93 0.00 68.6 13.4 0.0 1.0
2007 Cla Meredith Padres 342 6.67 1.92 0.68 72.0 13.6 17.1 1.0
2008 Brad Ziegler Athletics 229 4.53 3.32 0.30 64.7 18.8 6.3 0.5
2013 Joe Smith Indians 259 7.71 3.29 0.71 49.1 30.1 9.6 0.5
2008 Chad Bradford – – – 241 2.58 2.28 0.46 66.5 16.0 9.4 0.4
2012 Cody Eppley Yankees 194 6.26 3.33 0.59 60.3 19.1 11.1 0.3
2008 Joe Smith Mets 271 7.39 4.41 0.57 62.6 17.9 12.5 0.3
2009 Cla Meredith – – – 283 5.10 3.44 0.55 62.9 21.1 8.9 0.2
2010 Brad Ziegler Athletics 257 6.08 4.15 0.59 54.4 26.9 8.2 0.1
2014 Brad Ziegler Diamondbacks 281 7.25 3.22 0.67 63.8 18.9 13.5 0.1

 

Cluster 68

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Justin Verlander Tigers 982 10.09 2.36 0.75 36.0 42.8 7.4 7.7
2012 Justin Verlander Tigers 956 9.03 2.27 0.72 42.3 35.6 8.3 6.8
2011 Justin Verlander Tigers 969 8.96 2.04 0.86 40.2 42.1 8.8 6.4
2010 Justin Verlander Tigers 925 8.79 2.85 0.56 41.0 40.3 5.6 6.3
2013 Justin Verlander Tigers 925 8.95 3.09 0.78 38.4 38.9 7.8 4.9

 

Cluster 69

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Manny Parra Brewers 741 7.97 4.07 0.98 51.6 26.6 13.5 2.3
2014 Drew Smyly – – – 618 7.82 2.47 1.06 36.6 43.4 9.5 2.2
2012 J.A. Happ – – – 627 8.96 3.48 1.18 44.0 38.9 11.9 1.9

 

Cluster 70

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Gerrit Cole Pirates 571 9.00 2.61 0.72 49.2 31.8 9.4 2.3
2009 Luke Hochevar Royals 631 6.67 2.90 1.45 46.6 35.8 13.8 1.0
2012 Joe Kelly Cardinals 457 6.31 3.03 0.84 51.7 27.5 11.0 0.9
2008 Sidney Ponson – – – 612 3.85 3.18 0.93 54.5 26.2 10.9 0.9
2013 Joe Kelly Cardinals 532 5.73 3.19 0.73 51.1 28.2 8.9 0.7
2009 Roberto Hernandez Indians 596 5.67 5.03 1.15 55.2 27.0 13.7 0.0

 

Cluster 71

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Chris Young Padres 434 8.18 4.22 1.14 21.7 53.4 8.7 1.4
2012 Chris Young Mets 493 6.26 2.82 1.25 22.3 58.2 7.7 1.2
2013 Josh Collmenter Diamondbacks 384 8.32 3.23 0.78 32.7 46.8 6.9 1.0
2012 Josh Collmenter Diamondbacks 375 7.97 2.19 1.30 37.4 43.1 11.5 0.8
2009 Chris Young Padres 336 5.92 4.74 1.42 30.2 51.7 10.0 0.0

 

Cluster 72

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Madison Bumgarner Giants 873 9.07 1.78 0.87 44.4 35.8 10.0 4.0
2013 Jon Lester Red Sox 903 7.47 2.83 0.80 45.0 35.4 8.3 3.5

 

Cluster 77

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Josh Collmenter Diamondbacks 621 5.83 1.63 0.99 33.3 47.0 7.7 2.3
2014 Josh Collmenter Diamondbacks 719 5.77 1.96 0.90 38.8 39.9 8.3 1.9

 

Cluster 78

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 Rich Hill Cubs 812 8.45 2.91 1.25 36.0 42.9 11.7 3.1
2014 Tyler Skaggs Angels 464 6.85 2.39 0.72 50.1 30.9 8.7 1.5
2011 Danny Duffy Royals 474 7.43 4.36 1.28 37.5 40.3 11.5 0.5
2010 Manny Parra Brewers 560 9.52 4.65 1.33 47.2 34.5 14.8 0.3

 

Cluster 79

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 David Price Rays 836 8.74 2.52 0.68 53.1 27.0 10.5 5.0
2011 C.J. Wilson Rangers 915 8.30 2.98 0.64 49.3 31.9 8.2 4.9
2010 C.J. Wilson Rangers 850 7.50 4.10 0.44 49.2 33.5 5.3 4.1
2013 C.J. Wilson Angels 913 7.97 3.60 0.64 44.4 33.4 7.2 3.2
2012 Madison Bumgarner Giants 849 8.25 2.12 0.99 47.9 33.3 11.7 3.1
2011 Derek Holland Rangers 843 7.36 3.05 1.00 46.4 33.6 11.0 3.0
2012 Wandy Rodriguez – – – 875 6.08 2.45 0.92 48.0 31.6 10.1 2.5
2014 Jason Vargas Royals 790 6.16 1.97 0.91 38.3 38.7 8.2 2.2
2012 C.J. Wilson Angels 865 7.70 4.05 0.85 50.3 29.9 10.8 2.2

 

Cluster 85

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Cliff Lee Phillies 847 8.83 1.19 1.11 45.0 36.9 11.8 5.0
2014 Cole Hamels Phillies 829 8.71 2.59 0.62 46.4 31.1 8.2 4.3
2009 Wandy Rodriguez Astros 849 8.45 2.76 0.92 44.9 37.1 9.9 4.1
2012 Wade Miley Diamondbacks 807 6.66 1.71 0.65 43.3 33.7 6.9 4.1
2013 Jose Quintana White Sox 832 7.38 2.52 1.03 42.5 37.4 10.2 3.5
2009 Andy Pettitte Yankees 834 6.84 3.51 0.92 42.9 37.8 8.9 3.4
2012 Wei-Yin Chen Orioles 818 7.19 2.66 1.35 37.1 42.1 11.7 2.3

 

Cluster 86

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Josh Beckett Red Sox 883 8.43 2.33 1.06 47.2 31.7 12.8 4.2
2010 Max Scherzer Tigers 800 8.46 3.22 0.92 40.3 40.0 9.6 3.7
2014 Nathan Eovaldi Marlins 854 6.40 1.94 0.63 44.8 32.9 6.6 2.9
2012 Lucas Harrell Astros 827 6.51 3.62 0.60 57.2 22.5 9.7 2.8
2013 Jeff Samardzija Cubs 914 9.01 3.29 1.05 48.2 31.4 13.3 2.7
2011 Max Scherzer Tigers 833 8.03 2.58 1.34 40.3 39.5 12.6 2.2
2009 Mike Pelfrey Mets 824 5.22 3.22 0.88 51.3 30.0 9.5 1.7
2011 Roberto Hernandez Indians 833 5.20 2.86 1.05 54.8 26.6 13.0 0.9

 

Cluster 92

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Steve Cishek Marlins 275 11.57 2.89 0.41 42.7 31.1 5.9 2.0
2007 Sean Green Mariners 304 7.01 4.50 0.26 60.9 18.8 5.1 0.7
2008 Sean Green Mariners 358 7.06 4.10 0.34 63.3 19.5 6.1 0.7
2011 Shawn Camp Blue Jays 292 4.34 2.98 0.41 53.5 25.7 5.2 0.3
2010 Shawn Camp Blue Jays 298 5.72 2.24 1.00 52.0 31.4 11.1 0.2

 

Cluster 95

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Cliff Lee Indians 891 6.85 1.37 0.48 45.9 35.1 5.1 6.7
2012 Cole Hamels Phillies 867 9.03 2.17 1.00 43.4 35.1 11.9 4.6
2013 Cole Hamels Phillies 905 8.26 2.05 0.86 42.7 36.7 9.1 4.5
2008 Scott Kazmir Rays 641 9.81 4.14 1.36 30.8 48.9 12.0 2.0

 

Cluster 97

 

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Jered Weaver Angels 926 7.56 2.14 0.76 32.5 48.6 6.3 5.7
2009 Jered Weaver Angels 882 7.42 2.82 1.11 30.9 50.4 8.3 3.9
2014 Chris Tillman Orioles 871 6.51 2.86 0.91 40.6 39.3 8.3 2.3
2009 Joe Blanton Phillies 837 7.51 2.72 1.38 40.6 39.5 12.9 2.2
2013 Chris Tillman Orioles 845 7.81 2.97 1.44 38.6 39.8 14.2 1.9

 


A Year In xISO

For the type of baseball fan I’ve become — one who follows the sport as a whole rather than focuses on a particular team — 2016 was the season of Statcast. Even for those who watch the hometown team’s broadcast on a nightly basis, exit velocity and launch angle have probably become familiar terms. While Statcast was around last season, it seems fans and commentators alike have really embraced it in 2016.

Personally, I commend MLB for democratizing Statcast data, at least partially, especially when they are under no apparent obligation to do so. I’ve enjoyed the Statcast Podcast this season, but most of all, I’ve benefited from the tools available at Baseball Savant. For it is that tool which has allowed me to explore xISO. I first introduced an attempt to incorporate exit velocity into a player’s expected isolated slugging (xISO). I subsequently updated the model and discussed some notable first half players. Alex Chamberlain was kind enough to include my version of xISO in the RotoGraphs x-stats Omnibus, and I’ve been maintaining a daily updated xISO resource ever since.

Happily for science, all of my 2016 first half “Overperformers” saw ISO declines in the second half, while most of my first half “Underperformers” saw large drops in second half playing time. Rather than focus on individuals, though, let’s try to estimate the predictive value of xISO in 2016.

Yuck. This plot shows how well first-half ISO predicted second-half ISO, compared to how well first-half xISO predicted the same, for 2016 first AND second-half qualified hitters. Both of these are calculated using the model as it was at the All-Star break. There are two takeaways: First-half ISO was a pretty bad predictor of second-half ISO, and first-half xISO was also a pretty bad predictor of second-half ISO. Mercifully though, first-half xISO was a bit better than ISO at predicting future ISO. This is consistent with the findings in my first article, and a basic requirement I set out to satisfy.

Now, an interesting thing happened recently. After weeks of hinting, Mike Petriello unveiled “Barrels”. Put simply, Barrels are meant to be a classification of the best kind of batted balls. Shortly thereafter, Baseball Savant began tabulating total Barrels, Barrels per batted ball (Brls/BBE), and Barrels per plate appearance (Brls/PA). In a way, this is similar to Andrew Perpetua’s approach to using granular batted-ball data to track expected outcomes for each batted ball, except that the Statcast folks have taken only a slice of launch angles and exit velocities to report as Barrels.

By definition, these angles and velocities are those for which the expected slugging percentage is over 1.500, so it would appear that this stat could be a direct replacement for my xISO. Not so fast! First of all, because ISO is on a per at-bat (AB) basis, we definitely need to calculate Brls/AB from Brls/PA. This is not so hard if we export a quick FanGraphs leaderboard. Let’s check how well Brls/AB works in a single-predictor linear model for ISO:

Not too bad. The plot reports both R-squared and adjusted R-squared, for comparison with multiple regression models. I won’t show it, but this is almost exactly the coefficient of determination that my original xISO achieves with the same training data. I still notice a hint of nonlinearity, and I bet we can do better.

Hey now, that’s nice. In terms of adjusted R-squared, we’ve picked up about 0.06, which is not insignificant. The correlation plot also looks better to my eye. So what did I do? As is my way, I added a second-order term, and sprinkled in FB% and GB% as predictors. The latter two are perhaps controversial inclusions. FB% and/or GB% might be suspected to be strongly correlated with Brls/AB, introducing some undesired multicollinearity. While I won’t show the plots, it doesn’t actually turn out to be a big problem in this case. Both FB% and GB% have Pearson correlation coefficients close to 0.5 with Brls/AB (negative correlation in the case of GB%). Here’s the functional form of the multiple regression model plotted above, which was trained on all 2016 qualified hitters:

To be honest, there is something about my first model that I liked better. This version, using Barrels, feels like a bit of a half-measure between Andrew Perpetua’s bucketed approach and my previous philosophy of using only average exit-velocity values and batted-ball mix. My original intent was to create a metric that could be easily calculated from readily available resources, so in that sense, I’m still succeeding. Going forward, I will be calculating both versions on my spreadsheet. I’m excited to see which version serves the community better heading into 2017!

As always, I’m happy to entertain comments, questions, or criticisms.


Did the Cubs and Giants Have the Best Pitcher-Hitting Series Ever?

With a wild comeback in Game 4 on Tuesday night, the Cubs secured their spot in the NLCS for the second straight season. Considering where the team was just five years ago, this is obviously an impressive achievement. But maybe more impressive is how they reached that second consecutive NLCS. The Cubs scored 17 runs against the Giants in their NLDS showdown, and six of those were driven in by their pitchers! That’s an absurd 35% of the Cubs’ run output coming from the guys who usually do the run prevention.

When Travis Wood hit his incredible home run as a relief pitcher in Game 2, it was the first postseason home run from a pitcher since Joe Blanton took Edwin Jackson deep in Game 4 of the 2008 World Series, and the first postseason home run from a reliever since 1924.

When Jake Arrieta left the yard in the first inning of the very next game, it became the first postseason series with multiple home runs off the bats of pitchers since the 1968 World Series, when Mickey Lolich and Bob Gibson each went deep in a seven-game series. Of course, Lolich and Gibson were rivals, not teammates, making the Wood-Arrieta accomplishment even more impressive — and rare. In fact, it was only the second time in the history of baseball (per Baseball-Reference Play Index) that two pitchers, on the same team, hit home runs in the same series. The only other time with in the 1924 World Series, when New York Giant teammates, and pitchers, Jack Bentley and Rosy Ryan homered in Games 3 and 5 of the epic seven-game series. Wood and Arrieta were the only ones to do so in back-to-back games.

* * *

Now, it wasn’t just the Cubs pitchers getting in on the fun. For a while Tuesday night, it looked as though Giants starter, Matt Moore, was going to be a two-fold hero. Shutting down the Cubs offense from the mound, and knocking in the first run of the game for the Giants in the bottom of the fourth. While that was the only hit from Giants pitchers in the series, it was still enough to set the combined hitting totals for the two teams to: .250 batting average, with a .625 slugging percentage, while knocking in 23 percent of the total runs scored.

Those are some pretty crazy totals, but are they the best ever?

Using the aforementioned Play Index search of all-time postseason home runs from pitchers, there are 18 different series (including the 2016 NLDS) in which a pitcher homered. In those series, on three occasions, the pitcher who hit the home run was the only pitcher to get a hit in the entire series (1984 Rick Sutcliffe, 1978 Steve Carlton, 1975 Don Gullet). Only twice did pitchers combine for more than the 10 total bases from the Giants and Cubs, and only once did they drive in more than the seven runs (and they never topped the percent of runs driven in). Let’s go to the chart:

Top Team Pitcher Performances in the Playoffs

Year Hits AB BA TB SLG RBI Series runs % of RBI
2016 NLDS 4 16 0.250 10 0.625 7 30 23.33
2008 WS 2 13 0.154 5 0.385 1 39 2.56
2006 NLCS 2 25 0.080 5 0.200 1 55 1.82
2003 NLCS 3 28 0.107 6 0.214 3 82 3.66
1984 NLCS 4 17 0.235 7 0.412 1 48 2.08
1978 NLCS 2 17 0.118 5 0.294 4 38 10.53
1975 NLCS 2 12 0.167 5 0.417 3 26 11.54
1974 WS 4 20 0.200 8 0.400 1 27 3.70
1970 WS 2 25 0.080 5 0.200 4 53 7.55
1970 ALCS 5 18 0.278 10 0.556 6 37 16.22
1969 WS 5 26 0.192 10 0.385 5 24 20.83
1968 WS 5 36 0.139 11 0.306 4 63 6.35
1967 WS 2 30 0.067 8 0.267 2 46 4.35
1965 WS 5 32 0.156 9 0.281 6 44 13.64
1958 WS 7 37 0.189 10 0.270 8 54 14.81
1940 WS 3 39 0.077 7 0.179 2 50 4.00
1926 WS 4 39 0.103 8 0.205 2 52 3.85
1924 WS 8 42 0.190 14 0.333 5 53 9.43
1920 WS 6 39 0.154 9 0.231 3 29 10.34

After a brief peruse, it’s clear that there are only a few cases in which the pitchers in a series can even come close to what we just saw. Let’s take a look at the five best, in ascending order:

1968 World Series

This was one of the three series before the 2016 NLDS in which multiple pitchers hit home runs. In 1968, it was, as noted above, Bob Gibson and Mickey Lolich who homered in the series, one each for the Cardinals and Tigers. The reason this series is in fifth in the challengers to Cubs-Giants is because those two pitchers were really it. They drove in the only four runs from pitchers in the series (three of the four RBI coming on the two home-run swings), and there was only hit to hit come from a non-Gibson/Lolich pitcher.

1969 World Series

Just a year after our first entry into this challenge, the Mets and Orioles played in the first World Series to be led off with a League Championship Series. The extra-long season didn’t stop the Mets and Orioles pitchers from contributing all over the diamond, however, as they crammed five hits, 10 total bases, and five RBI into just a five-game series. Because of the abbreviated length of the series, this is one of the few series that can challenge the 2016 NLDS in terms of percentages. That being said, the Cubs-Giants pitchers take all three percentage categories, leaving there no real room for debate on this one.

1958 World Series

The 1958 series stands out in that it was the highest RBI total for pitchers in any postseason series to date. That was thanks in large part to top two pitchers for the Braves, Warren Spahn and Lew Burdette, tallying three RBI apiece. Burdette did it with the long ball, while Spahn preferred the death-by-a-thousand-cuts method, tallying his three RBI on four hits in the series. The Yankees got two RBI of their own from Bob Turley, but I’m not quite willing to give these guys the edge over the Cubs-Giants pitchers. The easiest argument for this year’s NLDS is that the Cubs-Giants pitchers tallied as many total bases and only one less RBI in three fewer games, as the 1958 World Series went to seven games, while this year’s NLDS went just four games.

1924 World Series

Here’s where the challenge gets real stiff. The 1924 World Series is the other series in which we have two home runs from pitchers, the aforementioned Bentley and Ryan teammates for the Giants. This series tops our charts in hits (8) and total bases (14), and is a reasonable choice for best-hitting series from a group of pitchers. I’m still giving the edge to Cubs-Giants in this showdown, though, and for a couple of reasons. Actually, really one reason with a couple different explanations: opportunity. Similar to the 1958 World Series, the 1924 World Series went to seven games, meaning that pitchers had far more games to rack up those hits and total bases. Pitchers were also left in games far longer in the 1920s, and as such, tallied almost three times as many at bats as the 2016 NLDS pitchers. When comparing batting average (.250 to .190) and, even more so, slugging percentage (.625 to .333) it becomes clear that this year’s Cubs-Giants pitchers still reign supreme.

1970 ALCS

Here’s our winner. The only series that I believe tops the recently concluded Cubs-Giants NLDS in terms of output from pitchers at the plate. This was an even shorter series than Cubs-Giants, as the Orioles only needed three games to dispatch the Twins. And their pitchers were a good chunk of the reason why. The Orioles used just four pitchers in the series, but all four got hits, combining for all of the offense you see above. (Twins pitchers were 0-for-5 in the series.) Not only did all four get hits, but all three starters got extra-base hits, as Dave McNally, Jim Palmer, and Mike Cuellar (Dick Hall was the reliever) all showed what they were capable of on the other side of the ball. Of course, the very next season, these three starters, along with Pat Dobson, would form just the second-ever set of four 20-game winners on the same team, proving just how awesome the late `60s and early `70s Orioles really were. They reign supreme for now, but let’s see how those Cubs starting pitchers do for the rest of the 2016 playoffs.


Let’s Get the Twins to the World Series

Imagine for a second that MLB Commissioner Rob Manfred has gone senile. I know that’s a ridiculous premise, and this is sure to be a ridiculous post, but bear with me. Commissioner Manfred, perhaps after a long night of choice MLB-sponsored adult beverages, has placed the Minnesota Twins in the playoffs. Yes, the same Twins of the .364 win percentage and facial hair promotional days. What is the probability that they make or win the World Series? For simplicity, let’s say they take the place of both AL Wild Card teams and are just inserted into the divisional playoffs.

We are going to look at a bunch of ways of estimating the probability the Twins win a five-game series or a seven-game series, then multiply our results accordingly to find an estimate for the team reaching each round. We’ll start simply, and gradually progress to more complicated methods of estimation. Let’s start as simply as possible, then, and use the Twins’ .364 win percentage.  The probability of the Twins winning a five-game series (at least three out of five games) is 25.7%. The same process gives them a 22.4% chance of winning a seven-game series. Multiplying these out gives the Twins a 5.8% chance of reaching the World Series (roughly 1 in 17) and a 1.3% chance of winning it. For reference, those are nearly the same odds FanGraphs gave the Mets of reaching/winning the World Series on October 2nd. Of course, those Mets also had to get through the Wild Card round (and the greatest frat boy to ever pitch a playoff game), but failed to do so.

Okay, so maybe you didn’t like that method because we included the Twins’ entire regular season, instead of just including games against playoff teams. Noted, but just understand that the Twins had basically the same win percentage against playoff teams (.365) as their overall percentage. Just to note, I defined playoff teams as the six division winners plus the four wild card teams. Using the Twins’ percentage against playoff teams yields identical probabilities as above.

How else can we attack this problem? Well, the Twins played 162 games this year, which means they have 158 different five-game stretches and 156 seven-game stretches. Over all those five-game rolling “series”, the Twins won at least three games 24.1% of the time, and they won at least four games in 25% of their seven-game tilts. Multiplying those figures out gives them a 6% chance of reaching the World Series and a 1.5% chance of becoming world champs.

Again, those numbers are unsatisfying because they include all teams, not just the playoff teams. However, removing the non-playoff teams leaves us with a bit of a sample issue because they played 52 games against playoff teams. So, let’s change the problem slightly: what is the probability that a last-place team can reach, and win, the World Series? The teams I’ll be considering all finished in last in their respective divisions: Twins, Athletics, Rays, Braves, Reds, and Padres. Cumulatively, these teams had a win percentage of .412, won 37.4% of their games against playoff teams, won at least three games in 30.6% of their five-game stretches, and won at least four out of seven 29.9% of the time. You can multiply these percentages out and get some answers.

I’m still not satisfied, so there is one more tool I’m gonna break out: a bootstrap simulation. Bootstrapping basically means sampling with replacement, which means every time I randomly choose a game from the sample, that game is thrown back in and has the same exact chance of getting picked again. This resampling with replacement process gives the bootstrap some pretty useful properties that I won’t get into here, but you can check here for more info.

I’m going to put all the games the last-place teams played against playoff teams into a pile. I’m going to randomly sample five games from that pile, with replacement, and count how many games were wins. I’m going to do this 100,000 times. I will then divide the number of samples that included at least three wins by the total number of samples, giving me an estimated probability of these last-place teams winning a five-game series against a playoff team. I will repeat this process for a seven-game series.

The bootstrap probability of a last-place team winning a five-game series against a playoff team was 27%. The probability of them winning a seven-game series was 24%. They have a 6.5% chance of reaching the World Series and 1.6% chance of winning it.

Honestly, these probabilities are lower than I expected. I have believed in and learned to embrace the randomness of the MLB postseason. I went into this post expecting the outcome to highlight just how random the postseason really is, even absurdly so. However, the randomness of the postseason really depends on the extremely small differences between all the teams at the top, so inserting teams from the very bottom of the league introduces a level of certainty that would be new to the playoffs. However, imagine repeating a similar exercise for the NFL or NBA. The 27% or so chance I’d give the Twins of advancing seems much higher than the probability of, say, the Cleveland Browns winning a playoff game if inserted into the postseason.

My methodology was clearly very simple, but intentionally so. I gave no acknowledgement to a home-field advantage adjustment, and I looked only at the team’s W-L record. A more complex method could have taken into consideration Pythagorean Expectation or BaseRuns.

This was a ridiculous post and ultimately a meaningless exercise. The Twins probably couldn’t reach the World Series if they were placed in the playoffs, but I’ll point out that as of this writing (October 10th during Game 3 of Nationals-Dodgers) the Cubs also probably won’t reach the World Series. Baseball is a weird and wonderful sport, and the postseason is the weirdest and most wonderful time of the year. If the Twins could conceivably reach the World Series as currently constructed, don’t think too hard about what’s happening and just enjoy.


53 Things About a 53-Second Finnish Baseball Video

With no baseball being played on this Monday night as I write this, I thought I’d throw this out for a quick fix.  Granted, this is baseball as it’s played in Finland:

 

Below is a second-by-second recap of all the glorious action.

{note – because the Stone-Age author doesn’t know how to post GIFs into an article, you’ll have to pause the video yourself to freeze the action for each of the 53 seconds}

0:01 – Dude in the white-striped uniform way off the plate, obviously trying to avoid catcher’s interference because of the dude in the orange-and-blue uniform.

0:02 – Orange-and-blue apparently spots the pitcher striding towards the pitcher’s mound, which I guess in Finnish is the “tikli”.

0:03 – There’s a “ski” on the back of the hitter’s jersey, so he must be Sami Haapakoski.  Not likely to be another Polish guy on a Finnish baseball team.

0:04 – And he’s got his hands backwards.  (I’d love to see how he holds a light bulb to screw it in)

0:05 – And now the catcher flips the ball up in the air!  A combination hidden-ball trick/quick-pitch.

0:06 – First baseman charging in…Sami charging at the offering, which can only mean…

0:07 – A line drive over the first baseman’s head.  Well played Sami!

0:08 – Sami now runs down the THIRD-BASE LINE!!!! (being half-Polish myself I have no more capacity to joke).  This means that the runner who’s already there (Jeano Segurannen) has to start running to second.

0:09 – What’s with the water hazard inside the park?  I guess with this being Finnish baseball, they’ve replaced right field with a right fjord.

0:10 – I like the greenery in right fjord.  Gives it a Wrigley-like ambiance (this is the Obligatory 2016 Cubs Reference™ for this article)

0:11 – Crowd going wild, screaming for Sami to run the bases the right way and not blow a well-earned ground-rule double.

0:12 – Or maybe it’s a ground-rule triple if it gets stuck in the poison ivy.  Not sure.

0:13 – Love the hustle on the guy in right fjord.  Plays the game the right way, he does.

0:14 – And emerging from behind a tree there’s an umpire, checking to see if the ball lodged in the poison ivy for a triple or into the water for a double….what, the ball’s IN PLAY??!?

0:15 – Yep. The right fjorder (Jonni Damonen) swiftly tosses a relay to one of his fellow outfjorders.

0:16 – Unfortunately, Ryän Raburninnen isn’t known for having the best “handle” in this sport

0:17 – Average water temperatures in Finland are colder than anywhere in the continental USA.  That’s because they’re measured in degrees Celsius.

0:18 – Look, there’s Jeano rounding the bases the right way

0:19 – Poor right fjorder takes his second plunge in the last five seconds.  Someone please fire up a sauna for ol’ Jonni.

0:20 – And there’s Sami flying like a Finn right behind him.  All this fumbling of the frigid fjord-frozen ball in right fjord has allowed them to finally move forward again.

0:21 – Nice flip by the right fjorder.  Maybe they should move him to second base, wherever the hell they put that in Finland.

0:22 – Nice use of the split screen for the fielding and baserunning portions of the play.  Might catch on for MLB telecasts if they ever tried it.

0:23 – Here comes Sami to his jubilant teammates….

0:24 – …PSYCH!!…

0:25 – …running up the third-base line without him

0:26 – The right fjorder pulls his hypothermic body up Tallinn’s Hill, his efforts having been to no avail.

0:27 – Why are they running out there with their bats?  I am so thoroughly confused.

0:28 – Led Zeppelin, the official sponsor of the third-base warning track.

0:29 – Those uniforms make these guys look like a NASCAR pit crew.  Waiting for one of them to hand Sami a champagne bottle to spray the place.

0:30 – Some guy in a blue jacket is taking a stroll in from left field, apparently oblivious to all the mayhem.

0:31 – This part of the field is also used for the Finnish Capture The Flag League.

0:32 – Finnish vodka is excellent.  Just ask the camera guy.

0:33 – Guy in blue jacket has a helmet on.  Must be from a different pit crew.

0:34 – Ebullient Finnish yelling.

0:35 – This part of the field was formerly used by the local Finnish Basketball Association team.  The team disbanded once it was discovered that someone forgot to put up an actual basket.

0:36 – The one guy with a green helmet comes towards the camera with his bat in ready position.  Must be the team’s enforcer.

0:37 – “HAYYYYY!!!”

0:38 – Another yell sounding like “BASEBALLLL!!!!”

0:39 – Coach about to give Sami a water bottle for all his efforts with the bat and on the basepaths (both clockwise and counterclockwise)

0:40 – Fun fact: one of those long Finnish words on Sami’s uni means “this space available for sale”.  I forgot exactly which one it was.

0:41 – At least Sami holds the water bottle correctly.

0:42 – How come there’s no left fjord?

0:43 – Fuzzy blue feet can only mean one thing — a mascot!  Wonder who/what they have for mascots in Finland?

0:44 – It’s the love child of these two!  Sweet!

0:45 – Not sure what that thing is over the bleachers behind home plate (home Frisbee?).  Looks vaguely aerodynamic.

0:46 – Someone obviously has a job that includes coordinating handtowels to these guys’ uniforms.  The age of specialization is not merely a North American phenomenon.

0:47 – Because Finnish baseballs are often contaminated with fjord-borne bacteria, used handtowels are the souvenir of choice.

0:48 – Eriko is like… what?

0:49 – Ignoring the two kids waving for the towel in the front, Sami fires a Hail Mary pass for the blonde in the top row.

0:50 – Notice all the parkas and heavy winter clothing on these fans.  Although the average game-time temperature in Finland is about 17°C, the temperature on this evening was only 10°C, which is just 10 degrees above the freezing point of the right fjorder’s uniform.

0:51 – Nobody bothered to man the lemonade stand in left field just past the bleachers.  Guy in the blue jacket probably just walked off with the lemons.

0:52 – Can the Finnish president override a vimpelin veto?

0:53 – Fun fact:  the official logo of Superpesis, the major league of Finnish baseball, has basically the same logo as the NBC peacock.

Thank you for watching, and have a nice day.


Hardball Retrospective – What Might Have Been – The “Original” 2002 Blue Jays

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams

Assessment

The 2002 Toronto Blue Jays 

OWAR: 51.4     OWS: 312     OPW%: .572     (93-69)

AWAR: 34.2      AWS: 234     APW%: .481     (78-84)

WARdiff: 17.2                        WSdiff: 78  

The 2002 “Original” Blue Jays breezed to the American League East title, vanquishing the Yankees by a nine-game margin. Toronto topped the American League in OWAR and OWS. Shawn Green (.285/42/114) registered 110 tallies, achieved his second All-Star appearance and finished fifth in the MVP balloting. Jeff Kent (.313/37/108) drilled 42 doubles and attained a career-high in home runs. Carlos Delgado belted 33 round-trippers and coaxed 102 bases on balls. John Olerud (.300/22/102) laced 39 two-base hits and collected the Gold Glove Award. In the midst of five straight seasons with a batting average above .300, Shannon Stewart sliced 38 doubles and scored 103 runs. Vernon Wells reached the century mark in RBI and added 34 two-base knocks in his first full season. The “Actual” squad featured 2002 AL Rookie of the Year Eric Hinske (.279/24/84) at the hot corner.

Jeff Kent placed forty-eighth among second-sackers in the “The New Bill James Historical Baseball Abstract” top 100 player rankings while John Olerud secured the 53rd slot at first base.

Original 2002 Blue Jays                            Actual 2002 Blue Jays

STARTING LINEUP POS OWAR OWS STARTING LINEUP POS AWAR AWS
Shannon Stewart LF 2.37 18.47 Shannon Stewart LF 2.37 18.47
Vernon Wells CF 0.83 16.7 Vernon Wells CF 0.83 16.7
Shawn Green RF 6.18 32.07 Jose L. Cruz RF/LF 1.73 12.62
John Olerud DH/1B 4.64 25.92 Josh Phelps DH 1.46 9.8
Carlos Delgado 1B 4.76 25.97 Carlos Delgado 1B 4.76 25.97
Jeff Kent 2B 6.04 29.93 Dave Berg 2B 0.18 8.61
Alex S. Gonzalez SS 2.78 14.36 Chris Woodward SS 2.17 11.74
Chris Stynes 3B -0.02 3.46 Eric Hinske 3B 3.8 21.81
Greg Myers C 0.57 5.57 Tom Wilson C 0.43 5.88
BENCH POS OWAR OWS BENCH POS AWAR AWS
Jay Gibbons RF 0.59 11.97 Raul Mondesi RF 0.08 6.33
Chris Woodward SS 2.17 11.74 Orlando Hudson 2B 1.17 5.89
Craig A. Wilson RF 0.95 10.78 Felipe Lopez SS 0.08 5.8
Michael Young 2B -0.63 10.72 Ken Huckaby C -1.24 1.78
Josh Phelps DH 1.46 9.8 Joe Lawrence 2B -0.83 1.48
Orlando Hudson 2B 1.17 5.89 Dewayne Wise RF -0.42 1.39
Felipe Lopez SS 0.08 5.8 Jayson Werth RF 0.04 0.77
Brent Abernathy 2B -0.44 4.99 Homer Bush 2B -0.27 0.75
Abraham Nunez 2B 0.04 4.88 Darrin Fletcher C -0.44 0.64
Cesar Izturis SS -0.68 3.77 Brian Lesher 1B -0.5 0.23
Ryan Thompson LF 0.14 2.84 Kevin Cash C -0.14 0.08
Joe Lawrence 2B -0.83 1.48 Pedro Swann DH -0.18 0
Pat Borders DH 0.06 0.36
Mike Coolbaugh 3B -0.17 0.16
Casey Blake 3B -0.11 0.11
Kevin Cash C -0.14 0.08

Roy “Doc” Halladay (19-7, 2.93) warranted his first All-Star invitation and led the American League with 239.1 innings pitched. David “Boomer” Wells compiled 19 victories with a 3.75 ERA. Toronto’s superb bullpen staff was anchored by Billy Koch (3.27, 44 SV) and Jose Mesa (2.97, 45 SV). The setup corps consisted of Steve Karsay (3.26, 12 SV), Ben Weber (7-2, 2.54) and Kelvim Escobar (4.27, 38 SV).

Original 2002 Blue Jays                          Actual 2002 Blue Jays

ROTATION POS OWAR OWS ROTATION POS AWAR AWS
Roy Halladay SP 6.74 21.67 Roy Halladay SP 6.74 21.67
David Wells SP 3.99 14.79 Pete Walker SP 1.85 8.74
Woody Williams SP 3.2 9.65 Mark Hendrickson SP 1.23 4.01
Gary Glover SP 0.03 4.54 Esteban Loaiza SP -0.15 3.86
Mark Hendrickson SP 1.23 4.01 Justin Miller SP -0.23 3.4
BULLPEN POS OWAR OWS BULLPEN POS AWAR AWS
Billy Koch RP 1.44 18.37 Kelvim Escobar RP 0.53 9.14
Jose Mesa RP 1.28 12.4 Cliff Politte RP 1.05 6.49
Steve Karsay RP 2.01 11 Corey Thurman RP 0.54 3.66
Ben Weber RP 1.33 10.48 Felix Heredia RP 0.09 3.12
Kelvim Escobar RP 0.53 9.14 Scott Eyre RP 0.11 2.83
Mike Timlin RP 1 8.04 Chris Carpenter SP 0.41 2.73
Giovanni Carrara RP 0.62 6.77 Steve Parris SP 0 1.88
David Weathers RP 1.02 6.68 Scott Cassidy RP -0.43 1.67
Chris Carpenter SP 0.41 2.73 Dan Plesac RP 0.33 1.39
Graeme Lloyd RP -0.53 1.89 Brian Bowles RP 0.04 1.37
Scott Cassidy RP -0.43 1.67 Jason Kershner RP 0.12 0.65
Jose Silva RP 0.11 1.38 Pedro Borbon RP -0.07 0.48
Brian Bowles RP 0.04 1.37 Scott Wiggins RP 0.05 0.2
Mark Lukasiewicz RP 0 1.17 Pasqual Coco RP -0.13 0
Jim Mann RP 0.18 1.02 Brian Cooper SP -0.59 0
Carlos Almanzar SW 0.24 0.94 Bob File RP -0.47 0
Tom Davey RP -0.36 0.17 Brandon Lyon SP -0.56 0
Pasqual Coco RP -0.13 0 Luke Prokopec SP -0.91 0
Bob File RP -0.47 0 Mike Smith SP -0.45 0
Pat Hentgen SP -0.54 0
Brandon Lyon SP -0.56 0
Aaron Small RP -0.08 0
Mike Smith SP -0.45 0
Todd Stottlemyre SP -0.38 0

Notable Transactions

Shawn Green 

November 8, 1999: Traded by the Toronto Blue Jays with Jorge Nunez (minors) to the Los Angeles Dodgers for Pedro Borbon and Raul Mondesi. 

Jeff Kent 

August 27, 1992: Traded by the Toronto Blue Jays with a player to be named later to the New York Mets for David Cone. The Toronto Blue Jays sent Ryan Thompson (September 1, 1992) to the New York Mets to complete the trade.

July 29, 1996: Traded by the New York Mets with Jose Vizcaino to the Cleveland Indians for Carlos Baerga and Alvaro Espinoza.

November 13, 1996: Traded by the Cleveland Indians with a player to be named later, Julian Tavarez and Jose Vizcaino to the San Francisco Giants for a player to be named later and Matt Williams. The Cleveland Indians sent Joe Roa (December 16, 1996) to the San Francisco Giants to complete the trade. The San Francisco Giants sent Trent Hubbard (December 16, 1996) to the Cleveland Indians to complete the trade. 

John Olerud 

December 20, 1996: Traded by the Toronto Blue Jays with cash to the New York Mets for Robert Person.

October 27, 1997: Granted Free Agency.

November 24, 1997: Signed as a Free Agent with the New York Mets.

October 29, 1999: Granted Free Agency.

December 15, 1999: Signed as a Free Agent with the Seattle Mariners. 

Billy Koch

December 7, 2001: Traded by the Toronto Blue Jays to the Oakland Athletics for Eric Hinske and Justin Miller.

Honorable Mention

The 1995 Toronto Blue Jays 

OWAR: 27.1     OWS: 208     OPW%: .469     (76-86)

AWAR: 25.4       AWS: 168      APW%: .389    (56-88)

WARdiff: 1.7                        WSdiff: 40

The “Original” ’95 Jays plodded to a fourth-place finish in the AL East, eleven games behind the Orioles while the horrific “Actuals” placed 30 games behind the Red Sox. David Wells delivered a 16-8 record with a 3.24 ERA and made his first appearance at the Mid-Summer Classic. Jose Mesa (1.13, 46 SV) blossomed in the closer’s role, meriting second place in the Cy Young Award balloting along with a fourth-place finish in the MVP race. Derek Bell pilfered 27 bases and established personal-bests in BA (.334) and OBP (.385). Fellow outfielder Glenallen Hill clubbed 24 long balls and set career-highs with 86 RBI and 25 stolen bases. Geronimo Berroa clubbed 22 taters and knocked in 88 runs. Jeff Kent contributed 20 dingers and John Olerud socked 32 doubles.

On Deck

What Might Have Been – The “Original” 1902 Cubs

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive