Josh Donaldson vs. the Elite

Tip: Don’t understand an acronym? Just click on it and it will take you to the corresponding FanGraphs glossary of terms.

Watching the final game of the Yankees – A’s series last week, which featured one of the game’s finest pitchers in Masahiro Tanaka, I had a thought during Josh Donaldson’s final at-bat against the Japanese hurler. After he struck out to finish 0-3 against Tanaka, my mind traveled back to the ALDS game 5s of the past two years. It’s no secret the A’s crashed out against a dominant Verlander in both 2012 & 2013, just like it’s no secret that Josh Donaldson was almost entirely absent in both of those very important games: 1-7, 0 BB, 3 K (with all 3 of those Ks coming in 2013′s game 5). 7 at-bats is obviously an incredibly small sample size, especially for an up-and-coming player getting his first taste of the postseason. However, for what Donaldson means to the A’s, there were certainly quiet rumblings of disappointment among the fan base.

Verlander is very good; it seems he’s especially good in high leverage situations when his team needs him. Josh Donaldson is also very good, posting 7.7 WAR last year in 158 games. This year, Donaldson has been even better, posting 3.4 WAR through just 62 games and asserting himself in the conversation of the best overall players in baseball. A sizable portion of that WAR comes from the plus defense he plays, but his bat is what he’s known for: since getting called up from the minors on August 14th, 2012 (the point at which his consensus “breakout” started), he’s batted .291/.377/.509 with a wRC+ of 148 (which means that Donaldson has created 48% more runs than a league average player). Only one player has higher WAR in 2013 and 2014 combined (Mike Trout), and only nine other players have higher wRC+. Josh Donaldson is an elite defensive and offensive player by many metrics.

After watching Donaldson’s at-bats against Tanaka, I started wondering how he fares against other elite pitchers in the game, having an unproven hunch he might struggle against them. We know that most everyone struggles against elite pitching, as that is generally the very definition of elite pitching; however, there’s the larger question of just how much impact elite pitching has on hitting statistics, and how elite hitters fare against elite pitching. One might assume that elite hitters are better able to succeed against elite pitching. Looking at Donaldson’s statistics, you wouldn’t think that is the case.

Pulling data from the start of the 2013 season, I’ve identified some of the “elite” pitching that Donaldson has gone up against. I’ve tried to identify pitchers he has faced most often in terms of plate appearances – fortunately (for our sake at least), those pitchers he’s seen most often are also elite arms in his division, like Felix Hernandez, Yu Darvish, and Hisashi Iwakuma. All pitchers on this list are ranked in the top 15 for xFIP for 2013-2014 (minimum 160 innings pitched) with the exception of Verlander (77th) & Lester (41st). I’ve included them as their FIP rankings are in the top 40, and because I’ve already used Verlander as a benchmark above. Here are Donaldson’s statistics for 2013 & 2014 against some of the best arms in the game, with his total statistics overall in the final line for reference:

Donnie_VS._Elite

These figures don’t include the 2012 and 2013 postseason series against the Tigers, which actually helps Donaldson’s case. However, let’s get the small sample size disclaimer out of the way before we continue. 113 plate appearances is about a month’s worth of full-time hitting statistics, which is not a tremendous sample to draw from, but not insubstantial either. What’s clear from these numbers is that Donaldson really struggles against elite arms, posting awful strikeout and walk rates and severely depressed average, on base, and power numbers (just 7 extra base hits in 104 at-bats).

One larger question we have to answer is whether Donaldson’s drop in production vs. elite pitching is congruent with the standard drop of production any hitter would expect when going up against this level of competition. To find that out, I combined all of the batting-against statistics for these 12 pitchers for all of 2013 & 2014, a total of 12,534 plate appearances, which gives us a “league average” line vs. these pitchers. The findings? These elite arms are really good. Big surprise, right? In fact, the league strikeout and walk rates against these pitchers is very close to Donaldson’s rates, with the walk rate exactly the same. Here are Donaldson’s numbers vs. the elite pitchers, his overall numbers vs. all competition, and then the league average line vs. the elite arms:

Donnie_BB_K_Rate

Even though we’re looking at the best pitchers in baseball, these statistics were still a bit surprising to me, as these league-wide walk and strikeout rates are abysmal from a hitter’s perspective. How does Donaldson’s slash line compare to the league average? Again, let’s take a look:

Donnie_3_Stats

We know that Donaldson’s poor BB and K rates fit tidily within the standards of the league line, as seen in the first graph, but his slash lines tell us that he’s been far worse than the rest of the league against these elite pitchers in the limited plate appearances we’re looking at. Shouldn’t we expect a player of his offensive caliber to fare better than league average against this level of competition?

The answer is not necessarily. Donaldson’s approach at the plate has a large bearing on the fact that he struggles against elite pitching. He is not a contact hitter, posting below average marks in swinging strike percentage, contact percentage, and Z-Contact percentage. In fact, he has changed his approach over the past calendar year specifically to try to hit more home runs, resulting in an almost 5% spike in his strikeout rate from 2013 to 2014 (16.5% to 21.1%), but also increasing his home run per fly ball rate by almost 7 points to 17.3%, an elite mark for someone who plays half of their games in one of the most pitching friendly ballparks in baseball. Coupled with an increase in his walk rate, Donaldson’s run creation output has benefited from Chili Davis’ hitting instruction, sitting on pitches he is more likely to drive and swinging hard at the expense of a lower average and higher strikeout rate. Donaldson batted .301 in 2013 with an inflated BABIP (.333), but with his change of approach, he projects somewhere in the .270 range moving forward.

Donaldson is the profile of a hitter that may be more apt to struggle against the elite pitching in the league due to the simple fact that elite pitchers tend to have makeups consisting of low walks and high strikeouts. For example, against “Power” pitchers (pitchers that are in the top third of the league in strikeouts plus walks), Donaldson has a career line of .210/.316/.356, showing that he struggles with pitchers who have strikeout potential, whether elite or not. He’s not alone in being a top offensive player that struggles against power pitching in relation to his overall performance: the benevolent baseball god Mike Trout slashes a fairly pedestrian (for him) .269/.379/.473 against the high strikeout arms.

The most important point to remember when looking at these statistics is that Josh Donaldson is currently one of the best players in baseball, regardless of his past performance versus elite pitching. He is a player that has enjoyed only a year and a half of sustained high-level performance and is continuing to make adjustments in hopes of greater success, which could completely alter his future at bats versus these elite arms I’ve highlighted. However, my gut tells me he may always struggle with these pitchers due to his approach at the plate, which trades contact for power – an Oakland A’s team-wide trait. It bears further scrutiny in the future for his potential playoff success, as he will obviously face more elite pitching in October when the average arms have gone home for the offseason. Will Donaldson and the Oakland A’s home run-centric approach carry them to a deep playoff run against the best arms in the game? Fortunately for us, it looks like we’re going to find out.

Wondering about the two home runs he hit off of Bumgarner and Sale? EXTRA CREDIT BONUS FREE BASEBALL GIFS!

Off Madison Bumgarner: May 27, 2013, 2-0, no out, 1 on, 4-seam fastball:

Donnie_Bums

Off Chris Sale: June 8th, 2013, 1-1, 1 out, 3 on (oppo taco all the way), 2-seam fastball:

Donnie_Sale

 


Over and Under-Performances in Baserunning

Right now Eric Hosmer is the worst base runner of 2014 by a decent margin over Adam Dunn.  This makes very little sense, well not the Adam Dunn part, but Eric Hosmer is an athletic player and not your traditional base clogging oaf.  For his career, Hosmer’s Spd rating is 4.4, which says he is right at average for speed overall.  Last year he was 11 of 15 on stolen base attempts and the year before he stole 16 bags in 17 tries.  You expect that the best base runners are fast and the worst are slow, and generally that seems to be true.  When it is not true though, there is an interesting difference in the groups.

I went out to look for two groups.  The first was a group of really fast players who had bad years on the base paths.  The cut-offs for them were an Spd rating of 7 or higher, considered excellent speed, and a negative Bsr and were therefore a liability on the bases despite their speed.  For Spd below average is 4.0, so for the second group I looked for players below that who managed to have great base running years, anything above 5 Bsr.

The total sample went back through the 1980 season for batting title qualified players, which included 5049 player years.  The group of fast players who had bad base running looks like this:

Year Player
1993 Al Martin
2003 Alex Sanchez
1984 Bill Doran
1983 Brett Butler
1991 Dan Gladden
1996 Fernando Vina
1982 Garry Templeton
1990 Lance Johnson
2001 Luis Castillo
1994 Luis Polonia
1984 Rudy Law
1990 Sammy Sosa
1991 Steve Finley

There are a lot of good players in there, and one legitimate superstar in Sammy Sosa.  You will notice that none of them repeated the feat either.  Only once in their careers did they manage to have the combo of excellent speed with negative base running value.  Most of them were just not very good base runners consistently and happened to have an especially bad year to get on the list.  Luis Castillo and Lance Johnson were decent on the base paths most years and had a few really good seasons.  Rudy Law had a Bsr of 10.6 the year before, by far the best season of any of these players, so I don’t know what happened in 1984.

Now to the group of over achieving base runners.  It is a small and accomplished list:

Season Name
2003 Albert Pujols
2008 Joe Mauer
2009 Ryan Zimmerman
2009 Scott Rolen

Again, no players repeated the feat, but this time the caliber of player jumps up.  Albert Pujols is an all time great.  Scott Rolen is a likely Hall of Famer, and Joe Mauer will probably get there.  The only one that isn’t likely to get to Cooperstown is Ryan Zimmerman, but it isn’t inconceivable that he could get there if he can get healthy and put some good seasons up through his 30s.  Even when they were young, none of these guys were particularly fast though Rolen managed to get a Spd of 6.1 once.  For all of these guys you can Google and quickly find things about their great work ethic and/or leadership qualities, so maybe only the truly diligent can make up for their lack of speed by being hard working students of the game.


Madison Bumgarner and His Strikeouts

Madison Bumgarner has pitched like a top tier starter since he appeared on a big league mound in 2009. While his strikeout rate—8.46 career K/9—has always been above average, something has clicked with the big lefty this season, launching him into legitimate No. 1 starter territory. Through 13 starts this season, Bumgarner has a K/9 of 10.04, ranking third in the NL behind only Stephen Strasburg and Zack Greinke.

What’s most fascinating about Bumgarner is that he’s dominating hitters basically with two pitches, both of which—the fourseamer and the cutter—are high velocity pitches. PITCHf/x has Bumgarner throwing a fourseamer 41.19% of the time and a cutter—which FanGraphs lists as a slider—37.22% of the time. For the sake of simplicity, I will refer to the latter pitch as a cutter.

This season, the fourseamer has been especially effective for Bumgarner. Batters are swinging at the pitch at a higher rate (45%) than any previous season, and they are making contact with less frequency, as the 29.50% whiffs per swing shows. In 2013 batters whiffed at the pitch 26.69% of the time when swinging; between 2009-2012, the rate was never higher than 19.81%.

The cutter, of which the usage rate has dropped almost 2% since 2012, has seen similar results as the fourseamer. Hitters swing at the cutter 57.25% of the time and whiff with 24.33% of those swings.

A big reason for the diminished contact rate is the fact that Bumgarner is throwing his pitches in strike zone less often than in years past. His in-zone rate is just 39.63%, the only time in his career that it’s been under 40%. When hitters swing at pitches out of the strike zone—which they do 36.01% of the time, a career high—they whiff 37.7% of the time. When swinging at pitches in the zone, the whiff rate is 16.75—a rate that surpasses those in any of his previous seasons.

When Bumgarner gets hitters into two-strike counts, his approach stays the same for the most part. In those counts, he throws his fourseamer 41.41% of the time and his cutter 37.5% of the time, both numbers just slightly above the overall usage in any count. The one aspect that changes in two-strike counts is Bumgarner’s usage of his curveball. Overall, he throws the pitch 11.93% of the time. In two-strike counts, the usage rate jumps up to 17.5% and batters swing at it 55.84% of the time. The result is a 20.78% whiff rate, highest of all Bumgarners pitches in two-strike counts (16.23% fourseamer, 14.55% cutter).

Another aspect of Bumgarner’s dominance this year has been his ability to fight back and limit damage when behind in the count. His fourseamer and cutter have been the main reason for this. When behind in the count 1-0, Bumgarner has thrown the cutter 43.57% of the time and the fourseamer 35.71%. Here’s what happens with those pitches in 1-0 counts:

Pitch Type Whiff/Swing Foul/Swing Swing%
Fourseamer 33.33 50.00 36.00
Cutter 16.67 53.33 49.18

 

Between the two pitches, over half of the swings result in a foul ball. Add in the whiff rates and Bumgarner finds himself back in the drivers seat more often than not after falling behind.

What about the 1-0 counts that get to 2-0? More often than not—62.22% of the time—he throws the cutter while throwing the fourseamer 31.11% of the time. Here are the results:

Pitch Type Whiff/Swing Foul/Swing Swing%
Fourseamer 33.33 50.0 50.0
Cutter 16.67 53.33 67.86

 

Again, more often than not, Bumgarner is able to fight through being in an unfavorable count. Once he gets to 2-1, he continues to attack hitters with the fourseamer (35.71%) and the cutter (54.29%). Here’s what happens:

Pitch Type Whiff/Swing Foul/Swing Swing%
Fourseamer 33.33 50.0 84.0
Cutter 16.67 53.33 73.68

 

In addition to these numbers, Bumgarner’s current walk rate of 2.01 BB/9 further shows that he doesn’t often lose hitters when falling behind. Rather, he uses his fourseamer and cutter to get himself back into a favorable count and is thus putting hitters away at a career-high rate.


Comparing the Three Cuban Stars: Abreu, Cespedes, and Puig

On February 13, 2012, the Oakland A’s shocked the baseball world by signing Cuban outfielder, Yoenis Cespedes. They never make big money signings but this time they did, signing him to a four year, $36 million deal. That season, he seemingly led the Oakland A’s to their surprising division title and was thought to be a major candidate for the MVP award for leading the A’s offensive charge. Had it not been for some player on the Los Angeles Angels, I think his name is Mike Trout, winning the Rookie of the Year, Cespedes would have been an easy pick for that award.

During that same season, another Cuban outfielder was signed by a Major League team. This time it was the Los Angeles Dodgers on June 28, 2012 signing 21 year old Yasiel Puig to a seven year, $42 million contract. Puig played in rookie ball and A ball in 2012 before making his Major League debut with the Dodgers in 2013. From that moment on, Cespedes was seemingly forgotten and the birth of “Puigmania” began. Puig, like Cespedes did for the Athletics, led the Los Angeles Dodgers offense in his 104 games with them to a division title. Puig too, lost out on Rookie of the Year but he certainly did provide a strong case for that award.

And this year, Puigmania rolls on but another Cuban slugger has come in as well. Jose Abreu of the Chicago White Sox (on a six year, $68 million contract) has burst onto the scene, making the White Sox one of the story teams this year. And while it is likely that the White Sox won’t make a run like the A’s or Dodgers did, Abreu certainly will make his strong case for Rookie of the Year.

Each of these players are great, all of them with phenomenal talent. One question that has been brought up with the recent emergence of Abreu is which Cuban player is better. Judging everyone based on the stats that they have put up and seeing how each one stacks up by the common scouting method called, “the five tools,” (the five tools being hitting for power, hitting for contact, speed, arm strength, and fielding ability). I will try to present a case for which one of them is truly the best. Now granted, both Cespedes and Puig have had more playing time than Abreu, but that will be taken into account when judging them.

Hitting for Power:

This, to me, is one of the most interesting of the five tools to compare the players because each of them has quite a lot of power. Cespedes has yet to post up a Major League season where he has not hit at least 20 homers (he looks to be on pace for that number this year again with his 12 homers in 55 games so far), Puig hit 19 home runs in only 104 games last year, and Jose Abreu has done nothing but knock the cover off the ball so far this year hitting 17 homers in a mere 47 games. But as many people who go on this website I’m sure know, there is more to power than just hitting home runs. Extra bases count. Doubles, triples, home runs, all contribute to one’s ability to hit for power.

Looking at ISO, Abreu is far and away the leader in this category. His .353 ISO leads Cespedes (.218) and Puig (.232) by a very wide margin. But since his .353 ISO is in a limited playing time of only 47 games, I have decided to measure the ISO through the first 47 games of both the careers of Cespedes and Puig. Cespedes’ ISO through his first 47 games was .341 and Puig’s was .310. While I can see Abreu’s power diminishing somewhat from this extraordinary power number, I can’t see Puig and Cespedes quite matching his power hitting ability (even though Cespedes really punished the baseball in the 2013 Home Run Derby).

Edge: Jose Abreu

Hitting for Contact:

This too is an interesting statistic to judge because there are so many numbers to indicate contact hitting ability. One could look at batting average to see who the best is but of course that could easily be countered by BABIP. For example, Puig has the highest batting average of the three, hitting .327 but his BABIP (.385) is over .100 points higher than both of the other two. The other two players have BABIP numbers that are remarkably close to their actual batting average. Cespedes’ batting average is at .262 with a .261 BABIP while Abreu’s batting average is at .266 but his BABIP is at .276. But those numbers are just how good someone is at letting the ball hit the ground and reach base with a hit, not necessarily making contact with the ball.

Each player is good at making contact with the baseball. One would think that because Puig has the highest batting average, he is the best at making contact but that is actually not true. In fact, of the three players, he makes contact the least of all of the players. He just happens to hit the ball in such a way that he gets a hit more often than the other two do. In terms of overall contact%, Cespedes makes the most contact with his 74.8% contact rate, Abreu comes after him with 70.9% contact, and Puig is third with 69.9%. When the ball is inside the strike zone, Abreu is slightly better than Cespedes with his 83.3% vs. Cespedes 82.5% (Puig is also fairly close at making contact with the ball 81.6% of the time when it is in the strike zone). When the ball is outside the strike zone, Cespedes is once again the contact king with a contact rate of 64%, Abreu is trailing far behind with only 55.5%, and Puig is again in third with 53.3%. Now granted, Puig’s numbers are improving, but so are Cespedes’ numbers and Abreu is still only in his first season with plenty of time to improve.

Edge: Yoenis Cespedes

Speed (Base running ability):

If anyone is expecting Abreu to be the best in terms of speed and overall base running ability, I’m going to tell you right now to not get your hopes up. Abreu isn’t awful in terms of base running but he is far from great. This is basically between Cespedes and Puig. With more time under his belt, Cespedes does have more stolen bases but they are both equal in caught stealing. Cespedes has stolen a total of 23 bases and been thrown out 12 times (a 66% success rate) while Puig has stolen 16 bases and been thrown out 12 times (57% success rate). Abreu has not attempted a steal yet. Then when looking at actual speed in terms of miles per hour, Cespedes has been clocked at a high of 19.4 mph while Puig has been clocked around 20 mph so Puig has a slight edge in terms of raw speed but not necessarily an overwhelming advantage. To settle the divide, a look at the sabermetrics should settle who is better.

To say the least, Yasiel Puig is reckless running on the bases. He runs very fast but he often runs into outs. So needless to say his BsR is hurting. He has a career -5.2 BsR with his low being in 2013 when he had a -4.2 number and his high being this year at -0.9. Yoenis Cespedes is much smarter on the bases. He doesn’t run himself into outs as frequently as Puig does and so his BsR career number sits at 2.9 with a low of 0.6 in 2013 and a high of 1.4 in 2012. And if that isn’t enough to show that Cespedes is better, his career Spd sits at 5.3 while Puig’s is at 4.8. For the record, Abreu’s Spd is at 2.8 and his BsR is at -0.7 so like I said, he isn’t bad but he just isn’t a very fast guy.

Edge: Yoenis Cespedes

Fielding Ability:

Defensive ability is always thought to be one of the toughest things to measure because there is no real perfect way to calculate it. Another thing making it difficult is that while outfielders Puig and Cespedes basically play the same position, Abreu does not. Since he is the only first baseman in this mix of players, we will look at his numbers first.

When stacking him up with the other first basemen, Abreu really doesn’t seem half bad. In terms of UZR, Abreu is 7th among all first basemen with at least 300 innings played with his 2.2 UZR which is slightly above average. In terms of Defensive Runs Saved, Abreu is 24th among all first basemen with at least 300 innings played with his -4 which is deemed below average. So by no means is he bad, he just isn’t great. Now in the outfield, Puig and Cespedes are different stories.

Puig and Cespedes are both very good defensive outfielders. In his career, Puig has been better defensively posting up a career UZR of 3.5 while Cespedes has put up a 2.7 number. When it comes to Defensive Runs Saved, Puig again holds an advantage with his +7 mark to Cespedes -1. All in all, while Abreu is a decent first baseman, Puig is a very good defensive outfielder (not deserving of a gold glove but none the less is the best defensive player of these three).

Edge: Yasiel Puig

Arm Strength:

Defensive ability isn’t just catching and fielding the ball, it is also having the arm to make big plays. But it is tough to tell who is best because there aren’t many numbers to point to actual arm strength. Puig has some of the more highlight reel arm throws, in terms of both good throws and bad throws, and so his arm has garnered the most attention of the three. Abreu, being a first baseman generally just has to do underhand flips to the pitcher covering the bag at first and occasionally start a double play feed so his arm is really not tested as much. So again, Abreu is eliminated from the conversation almost before it started. It is again between Puig and Cespedes.

Like I said, Puig has made some of the more highlight reel throws but him being in Los Angeles and in the center of a massive media hub might have some effect on that. Cespedes has made some very strong throws but being in Oakland where not much media attention is seen, he doesn’t get as much time on the highlight reels. Still, the arm of Cespedes is not to be denied. Again, he has played in more innings than Puig has so it would be expected that he would have more outfield assists than Puig, and he does. He has 25 assists, 13 more assists than Puig’s 12. He also has two more throwing errors with three compared to Puig’s 1. But the numbers show that in spite of those throwing errors, Cespedes rARM (Outfield Arms Runs Saved) is much higher, being a 12 as opposed to Puig’s 4. The other statistic to rate an outfielder’s arm is the ARM (Outfield Arm Runs), another stat designed to show runs saved based on throwing ability, that still has Cespedes higher with 13.8 to Puig’s 4.1. So sure Puig has made some good throws, but his arm is not better than that of Yoenis Cespedes.

Edge: Yoenis Cespedes

By judging each player by the scouting five tools, Cespedes does have an edge both in actual scouting reports and by the numbers. Cespedes has the best arm, base running ability, and contact ability while Puig is the best fielding and Abreu is the best power hitter. If only judging by the five tools, Cespedes appears to be the better player but when looking in terms of actual production, Puig has done the best over his career to this point. Posting a 7.2 WAR, Puig matches Cespedes’ exact same WAR in 160 fewer games. Puig also has the highest wOBA of them all (Puig has .415, Cespedes has .344, and Abreu has .396) and the highest wRC+ of the three (Puig with 172, Cespedes with 120, and Abreu with 151). Puig is also the youngest of the three at only age 23 while Cespedes is 28 and Abreu is 27 so there is more time and room for improvement.

And in conclusion, this article would not be complete if I also did not compare the bat flips of the three. So here they are:

Puig:

Cespedes:

And Abreu’s bat drop (I’m sure that he is working on his bat flip though):


Dellin Betances’s Jedi Mind Tricks

Before his June 6th appearance, Dellin Betances had thrown his knuckle curve 255 times, and it had amassed a value of 8 runs above average(according to FanGraphs), but that is not the point of this post. Betances throws the knuckle curve a lot (48% of the time), batters can’t hit it (74% zone contact, 20% out of zone contact!, for a total contact rate of 42%), and when they do it’s very weakly (15% line drives, 55% ground balls, 10% popups, 0 home runs). It’s impressive  but not what I’m interested in.

Here’s a hint, in gif form

DBKC

Batters take the pitch for a called strike all the time. They swing at the curve in the strike zone a measly 29.3% of the time. This is where it gets really crazy, they swing at it out of the strike zone 36% of the time! I’ll let that sink in. This may sound hyperbolic (it’s actually hypergeometric) but a literal blind person would be expected to do better than these pros have.  There is an 83.96% chance swinging at random would beat current major league performance.

For a little math aside, you can think of this like one of those marble problems. You have a jar filled with 116 red marbles (pitches in the strike zone) and 139 green marbles (pitches outside the zone), and you pick 84 (swing at) at random. What are the chances that out of the 84 marble you chose more than 34 are red (in the strike zone)?  You can determine the probability of picking more than 34 red marbles using a hypergeometric distribution.

How is it even possible to make major league players look so confounded (see gif above)?

The worst approach at the plate (other than sabotaging yourself) is just swinging at random.  There is an 84% chance that the approach of  these players is worse than random. A possible explanation is hitters are actually trying to swing at more of the pitches outside the strike zone. This sounds like a really stupid strategy, because it is. The only reason hitters should do this is if they were able to crush the knuckle curve when it’s outside the strike zone. Hitters haven’t crushed any of the knuckle curves (an anemic .029 ISO), and they are barely ever hitting it when it’s outside the zone. It makes you wonder if Betances is using Jedi mind tricks.

draft4

Assuming that Betances is not a Jedi (if he was wouldn’t he use his powers on his fastball as well?), then something else has to be going on. From the batter’s reaction you can tell that the batter thought the pitch was going to hit him. So, maybe the batters are just so worried about the 95MPH heater that they are getting surprised by the knuckle curve? Still Betances threw the pitch 48% of the time; it’s not a surprise pitch.  Whatever it Betances is doing is definitely making hitters look dumbfounded. I don’t know of any other pitch that gets a higher swing rate out of the zone than in it (if you can think of a pitch that gets more swings out of the zone than in leave it in the comments).

Thanks to Pitcher Gifs for this great gif.

Also and unrelated useless fact, hitter have exactly a .000 wOBA on plate appearances ending with DB’s knuckle curve.

This is definitely something to keep an eye on and look into further.  What makes a pitch look like a ball to the batter when its in the strike zone and look like its going to be a strike when it is out of the zone. This is the only pitch I know of that can do both.

I challenge any reader to find a pitch thrown more than 200 times that has a higher O-Swing% than Z-swing%, and leave the name of the pitcher and the pitch in the comments.

All stats are from FanGraphs PITCHf/x

This article was originally posted at GWRamblings.


Big Data and Baseball Efficiency: the Traveling Salesman had Nothing on a Baseball Scout

The MLB draft is coming up and with any luck I’ll get this posted by Thursday and take advantage of web traffic. I can hope! (ed. note: nope) Anyway, Tuesday on FanGraphs I read a fascinating portrayal of the draft process, laying out the nuts and bolts of how organizations scout for the draft. The piece, written by Tony Blengino (whose essays are rapidly becoming one of my favorite parts of this overall terrific baseball site), describes all the behind the scenes work that happens to prepare a major league organization for the Rule 4 draft. Blengino described the dedication scouts show in following up on all kinds of prospects at the college and high school levels, what they do, how much they need to travel, and especially how much ground they often need to cover to try and lay eyes on every kid in their area.

One neat insight for me was Blengino’s one-word description of most scouts as entrepreneurs. You could think of them almost as founders of a startup, with the kids they scout as the product the scouts are trying to sell to upper layers of management in the organization. As such, everything they can do to get a better handle on a kid’s potential can feed into the pitch to the scouting director.

I respect and envy scouts’ drive to keep looking for the next big thing, the next Jason Heyward or Mike Trout. As Blengino puts it, scouts play “one of the most vital, underrated, and underpaid roles in the game.” While one might make the argument that in MLB, unlike the NFL or NBA, draft picks typically are years away from making a contribution and therefore how important can draft picks be?, numerous studies have shown that the draft presents an incredible opportunity for teams in building and sustaining success. In fact, given that so much of an organization’s success hinges on figuring out which raw kids will be able to translate tools and potential into talent, one could (and others have)  made the argument that scouting is a huge potential market inefficiency for teams to exploit. Although I’ll have a caveat later. But in any case, for a minor league system every team wants to optimize their incoming quality because, like we say in genomic data analysis, “garbage in, garbage out.”

As I was reading this piece, I started thinking about ways to try and create more efficiencies. And I started thinking about Big Data.  Read the rest of this entry »


Foundations of Batting Analysis – Part 3: Run Creation

I’ve decided to break this final section in half and address the early development of run estimation statistics first, and then examine new ways to make these estimations next week. In Part 1, we examined the early development of batting statistics. In Part 2, we broke down the weaknesses of these statistics and introduced new averages based on “real and indisputable facts.” In Part 3, we will examine methods used to estimate the value of batting events in terms of their fundamental purpose: run creation.

The two main objectives of batters are to not cause an out and to advance as many bases as possible. These objectives exist as a way for batters to accomplish the most fundamental purpose of all players on offense: to create runs. The basic effective averages presented in Part 2 provide a simple way to observe the rate at which batters succeed at their main objectives, but they do not inform us on how those successes lead to the creation of runs. To gather this information, we’ll apply a method of estimating the run values of events that can trace its roots back nearly a century.

The earliest attempt to estimate the run value of batting events came in the March 1916 issue of Baseball Magazine. F.C. Lane, editor of the magazine, discussed the weakness of batting average as a measure of batting effectiveness in an article titled “Why the System of Batting Averages Should be Changed”:

“The system of keeping batting averages…gives the comparative number of times a player makes a hit without paying any attention to the importance of that hit. Home runs and scratch singles are all bulged together on the same footing, when everybody knows that one is vastly more important than the other.”

To address this issue, Lane considered the fundamental purpose of making hits.

“Hits are not made as mere spectacular displays of batting ability; they are made for a purpose, namely, to assist in the all-important labor of scoring runs. Their entire value lies in their value as run producers.”

In order to measure the “comparative ability” of batters, Lane suggests a general rule for evaluating hits:

“It would be grossly inaccurate to claim that a hit should be rated in value solely upon its direct and immediate effect in producing runs. The only rule to be applied is the average value of a hit in terms of runs produced under average conditions throughout a season.”

He then proposed a method to estimate the value of each type of hit based on the number of bases that the batter and all baserunners advanced on average during each type of hit. Lane’s premise was that each base was worth one-fourth of a run, as it takes the advancement through four bases for a player to secure a run. By accounting for all of the bases advanced by a batter and the baserunners due to a hit, he could determine the number of runs that the hit created. However, as the data necessary to actually implement this method did not exist in March 1916, the work done in this article was little more than a back-of-the-envelope calculation built on assumptions concerning how often baserunners were on base during hits and how far they tended to advance because of those hits.

As he wanted to conduct a rigorous analysis with this method, Lane spent the summer of 1916 compiling data on 1,000 hits from “a little over sixty-two games”[i] to aid him in this work. During these games, he would note “how far the man making the hit advanced, whether or not he scored, and also how far he advanced other runners, if any, who were occupying the bases at the time.” Additionally, in any instance when a batter who had made a hit was removed from the base paths due to a subsequent fielder’s choice, he would note how far the replacement baserunner advanced.

Lane presented this data in the January 1917 issue of Baseball Magazine in an article titled similarly to his earlier work: “Why the System of Batting Averages Should be Reformed.” Using the collected data, Lane developed two methods for estimating the run value that each type of hit provided for a team on average. The first method, the one he initially presented in March 1916, which I’ll call the “advancement” method,[ii] counted the total number of bases that the batter and the baserunners advanced during a hit, and any bases that were advanced to by batters on a fielder’s choice following a hit (an addition not included in the first article). For example, of the 1,000 hits Lane observed, 789 were singles. Those singles resulted in the batter advancing 789 bases, runners on base at the time of the singles advancing 603 bases, and batters on fielder’s choice plays following the singles advancing to 154 bases – a total of 1,546 bases. With each base estimated as being worth one-fourth of a run, these 1,546 bases yielded 386.5 runs – an average value of .490 runs per single. Lane repeated this process for doubles (.772 runs), triples (1.150 runs), and home runs (1.258 runs).

This was the method Lane first developed in his March 1916 article, but at some point during his research he decided that a second method, which I’ll call the “instrumentality” method, was more preferable.[iii] In this method, Lane considered the number of runs that were scored because of each hit (RBI), the runs scored by the batters that made each hit, and the runs scored by baserunners that reached on a fielder’s choice following a hit. For instance, of the 789 singles that Lane observed, there were 163 runs batted in, 182 runs scored by the batters that hit the singles, and 16 runs scored by runners that reached on a fielder’s choice following a single. The 361 runs “created” by the 789 singles yielded an average value of .457 runs per single. This method was repeated for doubles (.786 runs), triples (1.150), and home runs (1.551 runs).

In March 1917, Lane went one step further. In an article titled “The Base on Balls,” Lane decried the treatment of walks by the official statisticians and aimed to estimate their value. In 1887, the National League had counted walks as hits in an effort to reward batters for safely reaching base, but the sudden rise in batting averages was so off-putting that the method was quickly abandoned following the season. As Lane put it:

“…the same potent intellects who had been responsible for this wild orgy of batting reversed their august decision and declared that a base on balls was of no account, generally worthless and henceforth even forever should not redound to the credit of the batter who was responsible for such free transportation to first base.

The magnates of that far distant date evidently had never heard of such a thing as a happy medium…‘Whole hog or none’ was the noble slogan of the magnates of ’87. Having tried the ‘whole’ they decreed the ‘none’ and ‘none’ it has been ever since…

‘The easiest way’ might be adopted as a motto in baseball. It was simpler to say a base on balls was valueless than to find out what its value was.”

Lane attempted to correct this disservice by applying his instrumentality method to walks. Over the same sample of 63 games in which he collected information on the 1,000 hits, he observed 283 walks. Those walks yielded six runs batted in, 64 runs scored by the batter, and two runs scored by runners that replaced the initial batter due to a fielder’s choice. Through this method, Lane calculated the average value of a walk as .254 runs.[iv]

Each method Lane used was certainly affected by his limited sample of data. The proportions of each type of hit that he observed were similar to the annual rates in 1916, but the examination of only 1,000 hits made it easy for randomness to affect the calculation, particularly for the low-frequency events. Had five fewer runners been on first base at the time of the 29 home runs observed by Lane, the average value of a home run would have dropped from 1.258 runs to 1.129 runs using the advancement method and from 1.551 runs to 1.379 runs using the instrumentality method. It’s hard to trust values that are that so easily affected by a slight change in circumstances.

Lane was well aware of these limitations, but treated the work more as an exercise to prove the merit of his rationale, rather than an official calculation of the run values. In an article in the February 1917 issue of Baseball Magazine titled, “A Brand New System of Batting Averages,” he notes:

“Our sample home runs, which numbered but 29, were of course less accurate. But we did not even suggest that the values which were derived from the 1,000 hits should be incorporated as they stand in the batting averages. Our labors were undertaken merely to show what might be done by keeping a sufficiently comprehensive record of the various hits…our data on home runs, though less complete than we could wish, probably wouldn’t vary a great deal from the general averages.”

In the same article, Lane applied the values calculated with the instrumentality method to the batting statistics of players from the 1916 season, creating a statistic he called Batting Effectiveness, which measured the number of runs per at-bat that a player created through hits. The leaderboard he included is the first example of batters being ranked with a run average since runs per game in the 1870s.

Lane didn’t have a wide audience ready to appreciate a run estimation of this kind, and it gained little notoriety going forward. In his March 1916 article, Lane referenced an exchange he had with the Secretary of the National League, John Heydler, concerning how batting average treats all hits equally. Heydler responded:

“…the system of giving as much credit to singles as to home runs is inaccurate…But it has never seemed practicable to use any other system. How, for instance, are you going to give the comparative values of home runs and singles?”

Seven years later, by which point Heydler had become President of the National League, the method to address this issue was chosen. In 1923, the National League adopted the slugging average—total bases on hits per at-bat—as its second official average.

While Lane’s work on run estimation faded away, another method to estimate the run value of individual batting events was introduced nearly five decades later in the July/August 1963 issue of Operations Research. A Canadian military strategist, with a passion for baseball, named George R. Lindsey wrote an article for the journal titled, “An Investigation of Strategies in Baseball.” In this article, Lindsey proposed a novel approach to measure the value of any event in baseball, including batting events.

The construction of Lindsey’s method began by observing all or parts of 373 games from 1959 through 1960 by radio, television, or personal attendance, compiling 6,399 half-innings of play-by-play data. With this information, he calculated P(r|T,B), “the probability that, between the time that a batter comes to the plate with T men out and the bases in state B,[v] and the end of the half-inning, the team will score exactly r runs.” For example, P(0|0,0), that is, the probability of exactly zero runs being scored from the time a batter comes to the plate with zero outs and the bases empty through the end of the half-inning, was found to be 74.7 percent; P(1|0,0) was 13.6 percent, P(2|0,0) was 6.8 percent, etc.

Lindsey used these probabilities to calculate the average number of runs a team could expect to score following the start of a plate appearance in each of the 24 out/base states: E(T,B).[vi] The table that Lindsey produced including these expected run averages reflects the earliest example of what we now call a run expectancy matrix.

With this tool in hand, Lindsey began tackling assorted questions in his paper, culminating with a section on “A Measure of Batting Effectiveness.” He suggested an approach to assessing batting effectiveness based on three assumptions:

“(a) that the ultimate purpose of the batter is to cause runs to be scored

(b) that the measure of the batting effectiveness of an individual should not depend on the situations that faced him when he came to the plate (since they were not brought about by his own actions), and

(c) that the probability of the batter making different kinds of hits is independent of the situation on the bases.”

Lindsey focused his measurement of batting effectiveness on hits. To estimate the run values of each type of hit, Lindsey observed that “a hit which converts situation {T,B} into {T,B} increases the expected number of runs by E(T,B) – E(T,B).” For example, a single hit in out/base state {0,0} will yield out/base state {0,1}. If you consult the table that I linked above, you’ll note that this creates a change in run expectancy, as calculated by Lindsey, of .352 runs (.813 – .461). By repeating this process for each of the 24 out/base states, and weighting the values based on the relative frequency in which each out/base state occurred, the average value of a single was found to be 0.41 runs.[vii] This was repeated for doubles (0.82 runs), triples (1.06 runs), and home runs (1.42 runs). By applying these weights to a player’s seasonal statistics, Lindsey created a measurement of batting effectiveness in terms of “equivalent runs” per time at bat.

Like with Lane’s methods, the work done by Lindsey was not widely appreciated at first. However, 21 years after his article was published in Operations Research, his system was repurposed and presented in The Hidden Game of Baseball by John Thorn and Pete Palmer—the man who helped make on base average an official statistic just a few years earlier. Using play-by-play accounts of 34 World Series games from 1956 through 1960,[viii] and simulations of games based on data from 1901 through 1977, Palmer rebuilt the run expectancy matrix that Lindsey introduced two decades earlier.

In addition to measuring the average value of singles (.46 runs), doubles (.80 runs), triples (1.02 runs), and home runs (1.40 runs) as Lindsey had done, Palmer also measured the value of walks and times hit by the pitcher (0.33 runs), as well as at-bats that ended with a batting “failure,” i.e. outs and reaches on an error (-0.25 runs). While I’ve already addressed issues with counting times reached on an error as a failure in Part 2, the principle of acknowledging the value produced when the batter failed was an important step forward from Lindsey’s work, and Lane’s before him. When an out occurs in a batter’s plate appearance, the batting team’s expected run total for the remainder of the half-inning decreases. When the batter fails to reach base safely, he not only doesn’t produce runs for his team, he takes away potential run production that was expected to occur. In this way, we can say that the batter created negative value—a decrease in expected runs—for the batting team.

Palmer applied these weights to a player’s seasonal totals, as Lindsey had done, and formed a statistic called Batter Runs reflecting the number of runs above average that a player produced in a season. Palmer’s work came during a significant period for the advancement of baseball statistics. Bill James had gained a wide audience with his annual Baseball Abstract by the early-1980s and The Hidden Game of Baseball was published in the midst of this new appreciation for complex analysis of baseball systems. While Lindsey and Lane’s work had been cast aside, there was finally an audience ready to acknowledge the value of run estimation.

Perhaps the most important effect of this new era of baseball analysis was the massive collection of data that began to occur in the background. Beginning in the 1980s, play-by-play accounts were being constructed to cover entire seasons of games. Lane had tracked 1,000 hits, Lindsey had observed 6,399 half-innings, and Palmer had used just 34 games (along with computer simulations) to estimate the run values of batting events. By the 2000s, play-by-play accounts of tens of thousands of games were publically available online.

Gone were the days of estimations weakened by small sample sizes. With complete play-by-play data available for every game over a given time period, the construction of a run expectancy matrix was effectively no longer an estimation. Rather, it could now reflect, over that period of games, the average number of runs that scored between a given out/base state and the end of the half-inning, with near absolute accuracy.[ix] Similarly, assumptions about how baserunners moved around the bases during batting events were no longer necessary. Information concerning the specific effects on the out/base state caused by every event in every baseball game over many seasons could be found with relative ease.

In 2007, Tom M. Tango,[x] Mitchel G. Lichtman, and Andrew E. Dolphin took advantage of this gluttony of information and reconstructed Lindsey’s “linear weights” method (as named by Palmer) in The Book: Playing the Percentages in Baseball. Tango et al. used data from every game from 1999 through 2002 to build an updated run expectancy matrix. Using it, along with the play-by-play data from the same period, they calculated the average value of a variety of events, most notably eight batting events: singles (.475 runs), doubles (.776 runs), triples (1.070 runs), home runs (1.397 runs), non-intentional walks (.323 runs), times hit by the pitcher (.352 runs), times reached on an error (.508 runs). and outs (-.299 runs). These events were isolated to form an estimate of a player’s general batting effectiveness called weighted On Base Average (wOBA).

Across 90 years, here were five different attempts to estimate the number of runs that batters created, with varying amounts of data, using varying methods of analysis, in varying run scoring environments, and yet the estimations all end up looking quite similar.

Method / Event

Advancement Instrumentality Equivalent Runs Batter Runs

wOBA

Single

.490

.457

.41 .46

.475

Double

.772 .786 .82 .80

.776

Triple

1.150 1.150 1.06 1.02

1.070

Home Run

1.258

1.551

1.42

1.40

1.397

Non-Intentional Walk

—–

.254

—–

.33

.323

Intentional Walk —–

.254

—– .33 .179
Hit by Pitch —– —– —– .33

.352

Reach on Error

—–

—–

—–

-.25

.508

Out

—– —– —– -.25

-.299

 

Beyond the general goal of measuring the run value of certain batting events, each of these methods had another thing in common: each method was designed to measure the effectiveness of batters. Lane and Lindsey focused exclusively on hits,  the traditional measures of batting effectiveness.[xi] Palmer added in the “on base” statistics of walks and times hit by the pitcher, while also accounting for the value of those times the batter showed ineffectiveness. Tango et al. threw away intentional walks as irrelevant events when it came to testing a batter’s skill, while crediting the positive value created by batters when reaching on an error.

The same inconsistencies present in the traditional averages for deciding when to reward batters for succeeding and when to punish them for failing are present in these run estimators. In the same way we created the basic effective averages in Part 2, we should establish a baseline for the total production in terms of runs caused by a batter’s plate appearances, independent of whether that production occurred due to batting effectiveness. We can later judge how much of that value we believe was caused by outside forces, but we should begin with this foundation. This will be the goal of the final part of this paper.


[i] In his article the next month, Lane says explicitly that he observed 63 games, but I prefer his unnecessarily roundabout description in the January 1917 article.

[ii] I’ve named these methods because Lane didn’t, and it can get confusing to keep going back and forth between the two methods without using distinguishing names.

[iii] Lane never explains why exactly he prefers this method, and just states that it “may be safely employed as the more exact value of the two.” He continues, “the better method of determining the value of a hit is…in the number of runs which score through its instrumentality than through the number of bases piled-up for the team which made it.” This may be true, but he never proves it explicitly. Nevertheless, the “instrumentality” method was the only one he used going forward.

[iv] This value has often been misrepresented as .164 runs in past research due to a separate table from Lane’s article. That table reflected the value of each hit, and walks, with respect to the value of a home run. Walks were worth 16.4 percent of the value a home run (.254 / 1.551), but this is obviously not the same as the run value of a base on balls.

[v] The base states, B, are the various arrangements of runners on the bases: bases empty (0), man-on-first (1), man-on-second (2), man-on-third (3), men-on-first-and-second (12), men-on-first-and-third (13), men-on-second-and-third (23), and the bases loaded (123).

[vi] The calculation of these expected run averages involved an infinite summation of each possible number of runs that could score (0, 1, 2, 3,…) with respect to the probability that that number of runs would score. For instance,  here are some of the terms for E(0,0):

E(0,0) = (0 runs * P(0|0,0)) + (1 run * P(1|0,0)) + (2 runs * P(2|0,0)) + … + (∞ runs * P(∞|0,0))

E(0,0) = (0 runs * .747) + (1 run * .136) + (2 runs* .068) + … + (∞ runs * .000)

E(0,0) = .461 runs

Lindsey could have just as easily found E(T,B) by finding the total number of runs that scored following the beginning of all plate appearances in a given out/base state through the end of the inning, R(T,B), and dividing that by the number of plate appearances to occur in that out/base state, N(T,B), as follows:

E(T,B) = Total Runs (T,B) / Plate Appearances (T,B) = R(T,B) / N(T,B)

This is the method generally used today to construct run expectancy matrices, but Lindsey’s approach works just as well.

[vii] To simplify his estimations, Lindsey made certain assumptions about how baserunners tend to move during hits, similar to the assumptions Lane made in his initial March 1916 article. Specifically, he assumed that “runners always score from second or third base on any safe hit, score from first on a triple, go from first to third on 50 per cent of doubles, and score from first on the other 50 per cent of doubles.” While he did not track the movement of players in the same detail which Lane eventually employed, the total error caused by these assumptions did not have a significant effect on his results.

[viii] In The Hidden Game of Baseball, Thorn wrote that Palmer used data from “over 100 World Series contests,” but in the foreword to The Book: Playing the Percentages in Baseball, Palmer wrote that “the data I used which ended up in The Hidden Game of Baseball in the 1980s was obtained from the play-by-play accounts of thirty-five World Series games from 1956 to 1960 in the annual Sporting News Baseball Guides.” I’ll lean towards Palmer’s own words, though I’ve adjusted “thirty-five” down to 34 since there were only 34 World Series games over the period Palmer referenced.

[ix] The only limiting factor in the accuracy of a run expectancy matrix in the modern “big data” era is in the accuracy of those who record the play-by-play information and in the quality of the programs written to interpret the data. Additionally, the standard practice when building these matrices is to exclude all data from the home halves of the ninth inning or later, and any other partial innings. These innings do not follow the standard rules observed in every other half-inning, namely that they must end with three outs, and thus introduce bias into the data if included.

[x] The only nom de plume I’ve included in this history, as far as I’m aware.

[xi] Lane didn’t include walks in his Batting Effectiveness statistic, despite eventually calculating their value.


Pitch Win Values for Starting Pitchers – May 2014

Introduction

A few weeks back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology.  That post details the basic framework of these calculations and  can be found here.  This post is simply the May 2014 update of the same data.  What follows is predominantly data-heavy but should still provide useful talking points for discussion.  Let’s dive in and see what we can find.  Please note that the same caveats apply as last month.  We’re at the mercy of pitch classification.  I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available.  Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball).  Anything that may be classified outside of these categories is not included.  Also, anything classified as a “slow curve” (here’s looking at you, Yu) is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for May.  As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale.  We will tackle them each individually.

First, let’s discuss the strikeout constant.  In May, there were 52,100 strikes thrown by starting pitchers.  Of these 52,100 strikes, 5,005 were turned into hits and 15,110 outs were recorded.  Of these 15,110 outs, 4,058 were converted via the strikeout, leaving us with 11,052 ball-in-play outs.  11,052 ball-in-play strikes and 5,005 hits sum to 16,057 balls-in-play.  Subtracting 16,057 balls-in-play from our original 52,100 strikes leaves us with 36,043 strikes to distribute over our 4,058 strikeouts.  That’s a ratio of 8.88 strikes per strikeout.  This is up from 8.47 strikes per strikeout in March and April.  Hitters were slightly harder to strikeout in May that the previous two months.

The next two constants are much easier to ascertain.  In May, there were 29,567 balls thrown by starters and 1,575 walked batters.  That’s a ratio of 18.77 balls per walk, up from 18.50 balls per walk in March and April.  This data would suggest that hitters were slightly less likely to walk in May than previously.  The FIP subtotal for all pitches in May was 0.75.  The MLB Run Average for May was 4.32, meaning our FIP constant for May is 3.58.

Constant Value
Strikes/K 8.88
Balls/BB 18.77
cFIP 3.58

 

Pitch Values – May 2014

For reference, the following table details the FIP for each pitch type in the month of May.

Pitch FIP
Four-Seam 4.43
Sinker 4.29
Cutter 4.13
Splitter 4.03
Curveball 4.01
Slider 4.13
Changeup 4.80
Screwball 2.56
Knuckleball 3.38
MLB RA 4.32

As we can see, only two pitches would be classified as below average for the month of May: four-seam fastballs and changeups.  Sinkers also came in right around league average.  Pitchers that were able to stand out in these categories tended to have better overall months than pitchers who excelled at the other pitches.  Now, let’s proceed to the data for the month of May.

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Phil Hughes 0.7 185 Vidal Nuno -0.3
2 Ian Kennedy 0.6 186 Doug Fister -0.3
3 Jose Quintana 0.6 187 Wei-Yin Chen -0.3
4 Tom Koehler 0.5 188 John Danks -0.3
5 Lance Lynn 0.5 189 Mike Minor -0.4

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Mike Leake 0.5 171 Brandon Maurer -0.2
2 Dallas Keuchel 0.4 172 Wandy Rodriguez -0.2
3 Tyson Ross 0.4 173 Tom Koehler -0.2
4 Charlie Morton 0.4 174 Kyle Lohse -0.3
5 Chris Archer 0.4 175 Edinson Volquez -0.6

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Corey Kluber 0.5 74 Shelby Miller -0.1
2 Josh Collmenter 0.4 75 Kevin Correia -0.1
3 Adam Wainwright 0.4 76 Hector Santiago -0.1
4 Jarred Cosart 0.4 77 Brandon McCarthy -0.2
5 Madison Bumgarner 0.3 78 Cliff Lee -0.2

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Masahiro Tanaka 0.3 27 Alfredo Simon -0.1
2 Hisashi Iwakuma 0.2 28 Franklin Morales -0.1
3 Hiroki Kuroda 0.2 29 Clay Buchholz -0.1
4 Jake Odorizzi 0.2 30 Jorge De La Rosa -0.1
5 Ubaldo Jimenez 0.2 31 Danny Salazar -0.2

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 0.3 160 Clay Buchholz -0.1
2 Brandon McCarthy 0.2 161 Tyler Lyons -0.1
3 Ryan Vogelsong 0.2 162 Dan Straily -0.1
4 Tyler Skaggs 0.2 163 Yordano Ventura -0.1
5 Collin McHugh 0.2 164 Franklin Morales -0.2

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jason Hammel 0.4 120 Robbie Erlin -0.1
2 Ricky Nolasco 0.3 121 Kyle Gibson -0.2
3 Garrett Richards 0.3 122 Julio Teheran -0.2
4 Bud Norris 0.3 123 Johnny Cueto -0.2
5 Edwin Jackson 0.3 124 Yovani Gallardo -0.3

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.3 165 Josh Collmenter -0.3
2 Stephen Strasburg 0.3 166 Jake Peavy -0.3
3 Francisco Liriano 0.2 167 Danny Duffy -0.3
4 Henderson Alvarez 0.2 168 Drew Smyly -0.3
5 Eric Stults 0.2 169 Marco Estrada -0.7

Screwball

Rank Pitcher Pitch Value
1 Alfredo Simon 0.0
2 Trevor Bauer 0.0
3 Hector Santiago 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 0.6
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 1.2 192 Edinson Volquez -0.3
2 Mike Leake 1.1 193 Alfredo Simon -0.3
3 Jason Hammel 1.0 194 CC Sabathia -0.3
4 Dallas Keuchel 1.0 195 Franklin Morales -0.4
5 Masahiro Tanaka 0.9 196 Marco Estrada -0.7

Pitch Ratings – May 2014

Four-Seam Fastball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jason Hammel 60 86 Brandon Maurer 38
2 Aaron Harang 60 87 John Danks 36
3 Phil Hughes 59 88 Trevor Bauer 35
4 Yordano Ventura 59 89 Rafael Montero 35
5 Jose Quintana 59 90 Mike Minor 28

Sinker

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jeff Samardzija 58 71 Alfredo Simon 41
2 Jake Arrieta 58 72 Kyle Lohse 39
3 Aaron Harang 58 73 Ricky Nolasco 37
4 Blake Treinen 58 74 James Shields 37
5 Matt Shoemaker 57 75 Edinson Volquez 22

Cutter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Josh Tomlin 60 26 Ryan Vogelsong 46
2 Corey Kluber 60 27 Josh Beckett 45
3 Franklin Morales 59 28 Dan Haren 44
4 David Price 58 29 Kevin Correia 41
5 Jorge De La Rosa 58 30 Jesse Chavez 40

Splitter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jake Odorizzi 60 10 Ricky Nolasco 54
2 Masahiro Tanaka 59 11 Tim Lincecum 53
3 Wei-Yen Chen 58 12 Kyle Kendrick 46
4 Ubaldo Jimenez 57 13 Dan Haren 43
5 Alex Cobb 57 14 Jorge De La Rosa 40

Curveball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Felix Hernandez 60 61 Roenis Elias 42
2 John Lackey 59 62 Tommy Milone 41
3 Collin McHugh 58 63 Wei-Yen Chen 40
4 Jose Fernandez 58 64 Yordano Ventura 36
5 Mike Minor 58 65 Scott Carroll 35

Slider

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Yu Darvish 61 46 Jeremy Guthrie 40
2 Jhoulys Chacin 61 47 Homer Bailey 38
3 Corey Kluber 60 48 Julio Teheran 35
4 Edwin Jackson 60 49 Yovani Gallardo 31
5 Gavin Floyd 59 50 Kyle Gibson 30

Changeup

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Stephen Strasburg 59 59 Hector Noesi 33
2 Wade Miley 58 60 Cesar Ramos 30
3 Justin Verlander 58 61 Josh Collmenter 26
4 Francisco Liriano 57 62 Ian Kennedy 23
5 Anibal Sanchez 57 63 Marco Estrada 20

Screwball

Rank Pitcher Pitch Rating
1 Alfredo Simon 57
2 Hector Santiago 56
3 Trevor Bauer 56

Knuckleball

Rank Pitcher Pitch Rating
1 R.A. Dickey 55

Monthly Discussion

As we can see, Felix Hernandez ascended to the throne for this month riding the overall quality of his entire repertoire.  Hernandez was classified as throwing five different pitches in May (Four-Seam, Sinker, Curveball, Slider, and Changeup) and managed to earn at least 0.1 WAR in each category.  His best two pitches were his Sinker (0.4 WAR) and Changeup (0.3 WAR).  The most valuable pitch overall in May was the Four-Seam Fastball thrown by Phil Hughes.  The least valuable was Marco Estrada’s changeup.  As far as offspeed pitches, R.A. Dickey’s 0.6 WAR from his knuckleball lead the way.  Excluding Dickey’s knuckleball due to the sheer number of times it was thrown, the most valuable offspeed pitch was Jason Hammel’s slider.  The least valuable fastball was Edinson Volquez’s sinker.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Yu Darvish’s slider.  Unsurprisingly, the lowest rated was Marco Estrada’s changeup.  It’s difficult to generate -0.7 WAR with a single pitch unless it was just awful.  The highest rated fastball Jake Odorizzi’s splitter, and the lowest rated fastball was Edinson Volquez’s sinker.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Ian Kennedy 1.0 210 Doug Fister -0.3
2 Phil Hughes 1.0 211 Marco Estrada -0.3
3 Michael Wacha 0.9 212 Eric Stults -0.3
4 Jose Quintana 0.9 213 Dan Straily -0.4
5 Lance Lynn 0.7 214 Mike Minor -0.4

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Cliff Lee 1.0 195 Mike Pelfrey -0.3
2 Charlie Morton 0.9 196 Edinson Volquez -0.3
3 Felix Hernandez 0.8 197 Erasmo Ramirez -0.3
4 Dallas Keuchel 0.8 198 Dan Straily -0.3
5 Justin Masterson 0.7 199 Wandy Rodriguez -0.3

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Madison Bumgarner 0.7 88 Shelby Miller -0.2
2 Adam Wainwright 0.7 89 Brandon McCarthy -0.2
3 Corey Kluber 0.7 90 Felipe Paulino -0.2
4 Clay Buchholz 0.5 91 Johnny Cueto -0.3
5 Josh Collmenter 0.4 92 C.J. Wilson -0.3

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Masahiro Tanaka 0.5 27 Jorge De La Rosa -0.1
2 Tim Hudson 0.3 28 Alfredo Simon -0.2
3 Hisashi Iwakuma 0.2 29 Franklin Morales -0.2
4 Hiroki Kuroda 0.2 30 Clay Buchholz -0.2
5 Wei-Yin Chen 0.2 31 Danny Salazar -0.3

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jose Fernandez 0.6 182 Ivan Nova -0.1
2 Sonny Gray 0.6 183 Bronson Arroyo -0.2
3 A.J. Burnett 0.5 184 Clay Buchholz -0.2
4 Brandon McCarthy 0.5 185 Franklin Morales -0.2
5 Stephen Strasburg 0.4 186 Felipe Paulino -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Edwin Jackson 0.5 139 Yovani Gallardo -0.2
2 Bud Norris 0.5 140 Tim Lincecum -0.2
3 Jason Hammel 0.4 141 Jeremy Guthrie -0.2
4 Aaron Harang 0.4 142 Erasmo Ramirez -0.2
5 Garrett Richards 0.4 143 Danny Salazar -0.4

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Stephen Strasburg 0.5 191 Matt Cain -0.2
2 Francisco Liriano 0.5 192 Danny Duffy -0.3
3 Felix Hernandez 0.4 193 Drew Smyly -0.3
4 Eric Stults 0.4 194 Wandy Rodriguez -0.4
5 John Danks 0.4 195 Marco Estrada -0.6

Screwball

Rank Pitcher Pitch Value
1 Alfredo Simon 0.0
2 Trevor Bauer 0.0
3 Hector Santiago 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 1.1
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 1.8 216 Franklin Morales -0.4
2 Adam Wainwright 1.7 217 Dan Straily -0.4
3 Corey Kluber 1.6 218 Felipe Paulino -0.5
4 Aaron Harang 1.5 219 Marco Estrada -0.7
5 Jeff Samardzija 1.5 220 Wandy Rodriguez -0.8

Year-to-Date Discussion

If we look at the year-to-date numbers, Felix Hernandez still sits in the top spot.  Current AL and NL FIP leaders Corey Kluber and Aaron Harang rank third and fourth respectively.  The least valuable starter has been Wandy Rodriguez.  On a per-pitch basis, the most valuable pitch has been R.A. Dickey’s knuckleball, which should be the case for much of the season due to the heavy pitch totals.  Other than Dickey, the most valuable pitch has been Ian Kennedy’s four-seam fastball.  I guess there’s something to the idea of throwing a lot of fastballs in an extreme pitcher’s park after all.  The most valuable offspeed pitch has been Jose Fernandez’s curveball.  The fact that he still tops this list even after being injured and missing starts is simply astounding.  Get healthly Jose, we all miss your brilliance.  The least valuable pitch has been Marco Estrada’s changeup.  The least value fastball has been Mike Minor’s four-seam.  Qualitatively, I feel fairly encouraged by the year-to-date results so far.  The leaderboard is topped by two no-doubt aces, with the current FIP leaders coming in right behind them.  For reference, the top five in the year-to-date overall rankings are currently 1st, 6th, 2nd, 14th, and 22nd on the FanGraphs WAR leaderboards respectively.  Please feel free to provide feedback in the comments section.


Peter O’Brien’s Raw Power: Estimating Batted-Ball Velocities in the Minor Leagues

On May 20th Peter O’Brien hit a massive home run to straight away center clearing the 32 foot tall batter’s eye at Arm & Hammer Park more the 400 feet from home plate.  O’Brien is currently 1 home run behind Joey Gallo, in what looks to be an exciting competition for the minor league home run title.  O’Brien isn’t as highly touted a prospect as Gallo, but he still has some of the most impressive power in the minor leagues.  Reggie Jackson saw O’Brien’s home run and said it was one of hardest hit balls in the minor leagues that he had ever seen (and Reggie knows a thing or two about tape measure home runs).

How hard was that ball actually hit?  It is impossible to figure out exactly how hard and how far the ball was hit from the available information.  You can however use basic physics to make a reasonable estimation.

Below I explain the assumptions and thought process I used to get to an estimate of how hard the ball was hit.  If that does not interest you, then just skip to the end to find out what it takes to impress Reggie Jackson. But, if you’re curios or skeptical stick around.

OBSERVATIONS

I started off by watching the video to see what information I could gather (O’Brien’s at bat starts at the 37 second mark in the video).

TIME OF FLIGHT From the crack of the bat, to the ball leaving the park – it appears to take 5 seconds. If you watched the video, you can tell this is not a perfect measurement since the camera doesn’t track the ball very closely. If you think you have a better estimation, let me know and I’ll rework the numbers.  

LOCATION LEAVING THE PARK  The ball was hit to straight away center. From the park dimensions we know when it left the park it was 407 feet from home plate and at least 32 feet in the air to clear the batter’s eye.

ASSUMPTIONS

COEFFICIENTS OF DRAG (Cd) – The Cd determines how much a ball will slow down as it moves through the air. I chose 0.35 for the Cd because it is right in the middle of the most frequently inferred Cd values for the home runs that Allan Nathan was looking at in this paper.In looking at the Cds of baseballs, Allan Nathan showed there is reason to believe that there is some significant (meaning greater than what can be explained by random measurement error) variation in Cd from one baseball to another.

ORIGIN OF BALL I assume the ball was 3.5 feet off the ground and 2 feet in front of home plate when it was hit.  These are the standard parameters in Dr. Nathan’s trajectory calculator. But what if the location is off by a foot? The effects of the origin on the trajectory are translational. One foot up, one foot higher. One foot down, one foot lower. The other observations and assumptions are more significant in determining the trajectory of the home run.

Using these assumptions and the trajectory calculator, I was able to determine the minimum speed and backspin a ball would need in order to clear the 32 foot batter’s eye 5 seconds after being hit at different launch angles.  The table below shows the vertical launch angle (in degrees), the back spin (in RMPs) and the speed of the balled ball (in MPH).

Vertical launch angle Back spin Speed off Bat
19 14121 101
21 6817 101.9
23 4155 102.75
25 2779 103.69
27 1940 104.7
29 1375 105.89
30 1156 106.5
32 805 107.88
34 536 109.4
36 322 111.1
38 149 112.99
40 4 115.1

The graph shows a more visual representation of the trajectories in the table above (with the batter’s eye added in for reference).

http://i1025.photobucket.com/albums/y314/GWR87/OBrienhomerun_zpsb1507cf4.png

Looking at the graph you will notice that all of these balls would be scraping the top of the batter’s eye.  This makes sense because the table shows the minimum velocities and back spins needed for the ball to exactly clear the batter’s eye.

What is the slowest O’Brien could have hit the ball?

If you were in a rush, looking at the table you would think the slowest O’Brien could have hit the ball would be 101 MPH at 19o. But, not so fast! The amount of backspin required for the ball to travel at that trajectory is humanly impossible.

What is a reasonable backspin?

I am highly skeptical of backspin values greater than 4,000 rpm based on the Baseball Prospectus article by Alan Nathan “How Far Did That Fly Ball Travel?.” The backspin on home runs Nathan examined ranged from 500 to 3,500 rpm, with most falling in around 2,000. The first 3 entries in the table have backspins of over 4,000 and can be eliminated as possibilities. If the ball with the 19o launch angle only had 3,500 rpm of back spin it would have hit the batter’s eye less than 11 feet off the ground instead of clearing it.  Maybe you’re skeptical that I eliminated the 3rd entry because it’s close to the 4,000 rpm cut off.  Think about it this way, if a player was able to hit a ball with over 4,000 rpm of back spin, they would have to be hitting at a much higher launch angle than 23o (Higher launch angles generate greater spin while lower launch angles generate less spin).

The high launch angle trajectories with very little back spin (like the bottom three in the table) are also not very likely.  A ball hit with a 40o launch angle would almost certainly have more than 4 rpm of back spin.  If the ball hit with the 40o launch angle had 1,000 rmp of back spin (instead of 4) it would have been 70 feet off the ground, easily clearing the 32 foot batter’s eye.

Accounting for reasonable back spin, the slowest O’Brien could have hit the ball is 103.69 MPH at 25o with 2,779rpm of backspin.

So what do all these observations and assumptions get us?

We can say that the ball was likely hit 103.69 MPH or harder, with a launch angle of 25o or greater.  103.69 MPH launch velocity is not that impressive, it is essentially the league average launch velocity for a home run.  Distance wise, how impressive of a home runs was it? Unobstructed the ball would have landed at least 440 feet from home plate (assuming the 25o scenario).  The ball probably went further than 440 because it did not scrape the batter’s eye. So, how rare is a 440+ foot home run? Last year during the regular season there were 160 home runs that went 440 feet or further, there were a total of 4661 home runs that season, meaning only 3.4% of all home runs were hit at least that far.

For those of you who wanted to just skip to the end. My educated guess is that the ball went at least 440 feet and left the bat at at least 103.69 MPH.

If you like this, you can read other articles on my blog GWRamblings, or follow me on twitter  @GWRambling

None of this would have been possible without Alan Nathan’s great work on the physics of baseball.  I used his trajectory calculator to do this, and I referenced his articles frequently to make sure I wasn’t way making stupid assumptions. The information on major league home run distance is based off of hittrackeronline.com


Old Player Premium

One of Dave Cameron’s articles a while back showed payroll allocations by age groups, and it shows that over the last five years or so more money is going to players in their prime years while less is being spent on players over 30.  That seems to be a logical thing for teams to do, but that trend can only continue for so long.  Eventually a point will be reached where older players are undervalued, and it might be possible that we are already there.

There are several things to keep in mind when comparing these age groups, and one of the biggest is the survivorship bias.  There is a natural attrition over time for players in general.  Let’s look at an example, and for all the following I will be using 2012 versus 2013 as a way to see what happens from year to year.  To look at survivorship, I looked at all position players in 2012 and then their contribution in 2013 to see how many disappeared the next year.  The players that were not in the 2013 year could be due to retirement, demotion, injury, etc.  I also took out a small group that played in both seasons, but were basically non-factors in 2013, for example Wilson Betemit played in both seasons, but in 2013 he only had 10 plate appearances.  The attrition rate for the age groups looks like this:

Age Group % of 2012 Players That Did Not Contribute in 2013
18-25 22.2%
26-30 25%
31-35 29.3%
36+ 38.9%

As you would expect, the attrition rate increases over time.  Players in their late teens and early 20s who make it to the majors are likely to be given opportunities in the near future, but as the age increases the probability of teams giving up on the player, major injury, or retirement goes up.  Players who make it from one group to the next have survived, and that is where the bias comes in.  By the time you get to the 36+ group a significant number of the players are really good because if they weren’t they would not have made it so far.  This ability to survive is also a reason why they should be getting a good chunk of the payroll.  As I will show you, it leads to steady play which teams should pay a premium for.

The next step is looking at performance risk among the groups.  To look at this I took each group’s performance in 2012 and compared it to the group’s performance in 2013, again only with survivors from year to year.  I looked at both wRC+ and WAR just to see if only the hitting component or overall performance behaved differently.

Further, to calculate a risk level I looked at the standard deviations of the differences (2013 minus 2012) for each player, but those are not directly comparable.  Standard deviation is higher for distributions with higher averages due to scaling issues.  For instance, the average 36+ player had a 95 wRC+ in 2012 versus, which is more than 10 wRC+ above the average 18 to 25 year old in the same year.  A 10% drop or increase  in production is therefore a larger absolute change for the 36+ player, so they naturally end up with a higher standard deviation.  To take care of this I calculated the standard deviation of the difference as a % of 2012 average production as the overall riskiness measure.

Age Group wRC+ Risk WAR Risk
18-25 56.5% 167.7%
26-30 48.3% 118.9%
31-35 46.4% 140.7%
36+ 35.2% 92.8%

Don’t compare the wRC+ to WAR figures as there are again scaling issues, but look at the age groups.  A one standard deviation change is most volatile for the youngest age group, so the younger players are the most uncertain or most risky.  That is what we would expect as we have all seen prospects flame out.  The middle two groups are similarly volatile with the 31 to 35 group have a slightly lower risk level in the hitting for this sample and slightly higher overall play according to the WAR risk.  More years might need to be compared to see how consistent those groups are relatively.  The 36+ players are significantly less risky than the other ages.  If they decline by 1 standard deviation it will mean a smaller reduction in performance, less volatile and less risky.

The only thing that really hurts the older players is the aging curve.  They are more likely to see a decline in performance.  From the youngest group to oldest the percent of players who were worse in 2013 than they were in 2012 by wRC+ was 52.3%, 54.5%, 64.4%, 63.6%, and for WAR 52.9%, 48.7%, 56.7%, and 81.8%.  So it is more likely that the older players will see performance worse than the previous year, but again a drop for them will likely be smaller due to lower volatility and it is on average from a higher level of performance to begin with.

Older players are like buying bonds for your investment portfolio, you have a pretty good idea of what there going to pay in the next period with occasional defaults.  Younger players are more like growth stocks, you aren’t sure when or if they are going to pay dividends but when they do you can make huge returns.  Investors pay a premium for bonds (accept a lower rate of return) due to their stability, and teams pay more for older players than maybe their production seems to warrant for the same reason.

 photo Survivor_zpsee696878.jpg

If you go back to the payroll allocation, part of the shift is in the number of players in each group.  The 31-35 year-olds no longer get the largest chunk of payroll in part because there are more 26 to 30 year-old players.  Baseball is getting younger overall, so a larger portion of the money going to younger players is inevitable.  The 18 to 25 group isn’t getting a large change in payroll allocation because they are generally under team control, but the teams are extending the players at that age with the money showing up as they get into the next couple age groups.  Like Chris Sale, who is making $3.5 million this year on the extension he signed (he’s 25), but when he is 26, 27, and 28 he will make 6, 9.15, and 12 million respectively.

So the 36+ group, as you can see only 4.7% of the players, used to make about 20% of the total salaries paid, but now they make 15 or 16% (I don’t have Dave’s exact numbers).  Is that premium fair, four times more of the allocation than they make up of the overall player pool?  That is a tough question, and one I am working on.  If anyone can give me tips on how to dump lots of player game logs, that is probably what I am going to do next, but haven’t figured out how to do it without eating up my entire life.  Being more certain on this sort of thing, and having a relative risk measure for players could make contracts a lot easier to understand and predict.