Archive for Player Analysis

Comparing Ben Revere to Nook Logan

Note: Post was written on March 12. Truths may be falsehoods by the time you read this.

So Nook Logan was trending on FanGraphs this morning. Still is, in fact, as of this writing—he is eighth on the “Major League Players” list, sandwiched between Clayton Kershaw and Yasiel Puig—and that piqued my interest. So I looked at his FanGraphs profile. Upon this inspection, I found that Logan seems to compare quite favorably to another center fielder who started with an AL Central team and moved to a NL East team: Ben Revere. Logan, from a quick glance at his stats, appears to have good speed, but no power and a good glove, but no arm: all traits possessed by Mr. Revere.

First, let’s establish each of these characteristics.

Logan stole exactly 23 bases in each of his two full seasons: 2005 with Detroit and 2007 with Washington. Revere has played three full seasons, with at least 34 steals in each of those seasons, plus another 22 in an injury-shortened 2013. So maybe Revere has more speed. According to their Speed Scores (Spd), where Logan is rated at a 7.4 and Revere at a 7.0, Logan is faster, or at least utilizes his speed better. Both are “Excellent” scores, though, and the FG glossary tells us to look at UBR, so we do. “Hmm,” we say. “Revere has a much higher UBR than Logan.” And yes, this is true. And yes, Revere also has higher values in everything related to base running. So we’ll say that Revere is a far superior baserunner despite Logan’s slightly better speed.

“No power” is not hard to determine. Logan has never had an ISO above .089. Revere has only had two ISOs above .100, and he hit a combined one home run in those seasons (his first two minor-league seasons). He was just hitting and running, and getting a lot of triples. Such is the life of a minor-league speedster. But I will take this time to mention something: Revere has a career K rate of 9.1%, only rising above ten percent three times—once in the minors, once in his “cup of coffee” 2010, and once in his injury-plagued 2013. His walk rate, though, is bad. Like really bad. Like it was barely above the amount of fat in my milk last season bad. Logan, on the other hand, had a better (and more consistent) walk rate, hovering around six percent his entire career (save for an 8.6% rate his rookie year). But he struck out a ton. Or, rather, he struck out a normal amount, then got sent down by the Tigers and started striking out a lot more, and continued to strike out at high rates after being dealt to the Nats. I don’t know what to make of this data, but it is a dissimilarity.

Now we turn our attention to a section I didn’t mention in what an English teacher might call my thesis sentence: batted-ball rates. Both men hit a high amount of ground balls, over 50 percent in all but one season (Logan’s short 2006 campaign, when he hit only 46.7% ground balls, is the lone exception). This doesn’t give us the full picture: while Logan certainly hit his fair share of grounders, he also hit a fair amount of fly balls, checking in at 29 percent for his career, with a 7.9% infield fly rate. Revere, on the other hand, hit so many ground balls that it might be considered unhealthy if he weren’t so doggone fast. Revere has managed to hit about one-seventh of his BIP in the air, and only 3.4 percent of that has been represented by popups. So, um, Logan hit more fly balls, but the same amount of home runs (two). Say what you want about the fact that he would have four in Revere’s sample size.

And now, defense: the hardest part to talk about, because there are so many ways to statisticize (that can’t be a word) it and none of them have become the “standard” method of measuring defensive contributions. First, we’ll only be discussing Logan’s and Revere’s performance in center field, because it is their primary position and also because they have played a not-dissimilar amount of games at that position. First of all, when normal people (I’m weird, you’m weird, everybody’m weird) talk about defense, they think of errors and assists, probably. Or just they think of how many times they saw that one guy make that one catch—you know, the one where he dives, makes the catch, makes another, leaping, catch at the wall, and then throws out all twelve baserunners, including the guy going from fifth base to shortstop, saving the game and making it onto the Top Ten playlist seven hundred million times. Okay, that was a lot of mumbo-jumbo that basically meant, “normal people don’t think in terms of UZR and TZL, they think in terms of highlight plays and errors”.

And now, the actual discussion of defense. Logan had eight assists, 11 errors, and a .985 FLD% in 306 games. Revere had 13 assists, 13 errors, and a .986 FLD% in 362 games. So it can be said that both are players who do their job mostly, but also have little to nothing in the way of arm strength. ARM thinks that they’re basically the same player, but UZR thinks that Logan is eighty times better at defense than Revere, so okay. They have very similar Fielding values, though. ¯\_()_/¯

So yeah, maybe this served no point, but I like writing semi-pointless things about semi-obscure players. Maybe you can expect more of the same in the future.


Jacob deGrom Fearless Forecast

Matt Harvey is getting all the hype these days, touching 99 mph on the gun, throwing nasty 84 mph curves, and looking healthy. I think he will have an excellent year. For some reason though, the world at large is still underrating Jacob deGrom.

First off, I recommend you read this FanGraphs article from midsummer, detailing the changes he made to his pitching mechanics to make this “rags to riches” leap into the upper echelon.

I’ve been notoriously high on deGrom since I watched him pitch. I wrote about him on reddit back in July 2014. I’ll update the numbers I used, infra:

He’s been excellent — and not in any flukey kind of way. deGrom’s pitch types and peripherals support that what he did last year is VERY REAL.

Let me reiterate last year’s line: 140 IP (178.1 IP of usage), 9.2 k/9, 2.69 ERA, 1.14 WHIP, 2.67 FIP, 3.03 xFIP, 3.19 SIERA. Those are top-20 numbers. And unlike phenoms that regress with time (see Jesse Hahn in 2014), deGrom only got BETTER as the innings racked up.

That is what we love to see — for three reasons:

(1) His body can withstand the rigors of a 200 IP season,

(2) He IMPROVED, rather than regressing, and

(3) Hey, for those of us in H2H leagues, we want our guy pitching well for the fantasy playoffs!

His control improved with time, with increased strikeouts. As of my last post, he had an 8.8 k/9 and 2.7 K/BB. He ended year with a 9.3 k/9 and 3.4 K/BB. We love to see improvement in both those respects. Keep the walks down and strikeouts up, and success often naturally follows!

He’s generating a lot of swinging strikes. For reference, the league average sw/str% is approximately 8.6%.

Jacob deGrom has an overall 11.9 sw/str%, which is well above league average. Looking at pitch F/X data, his slider (12.4% sw/str%, 46/370 pitches), changeup (20.2%, 55/272), both fastballs (10.8%, 108/1000), and curveball (16.0%, 34/212) are all above-average, strikeout-quality pitches.

deGrom essentially features a five-pitch arsenal. Of 2,225 MLB pitches thrown:

44.9% (1000/2225) Fastballs averaging 93.5 mph. Max Velocity, 97.3 mph.

16.5% (368/2225) 2-Seam Fastballs averaging 93.2 mph, Max Velocity, 97.4 mph.

16.6% (370/2225) Sliders averaging 86.8 mph, Max Velocity 91.3 mph (adding mph to his slider is a huge part of his success).

12.2% (272/2225) Changeups averaging 83.9 mph.

9.5% (212/2225) Curveballs averaging 79.3 mph.

3 Cutters–not really a pitch he uses.

deGrom has a diverse arsenal of pitches, with some legitimate velocity differentials, and a good fastball, topping out at 97+ mph. He has 7 mph between fastballs and slider. 10 mph between fastballs and changeup. 14+ mph between fastballs and curveball. 22.5 mph between the high-end spectrum of his fastball and low-end spectrum of his curve.

Essentially, deGrom is legit. His peripherals and Pitch F/X data don’t really suggest that he’s due for any significant regression. Citi Field is still an excellent pitcher’s park, despite the fact that the fences were recently moved in (3-11 feet). I don’t think it will make a significant difference; maybe a home run or two leaves the park that wouldn’t have before.

It’s worth noting that his top speeds increased late in the year, logging his highest speed fastball in the second half of the season. Again, I love a pitcher that doesn’t fatigue.

Concerns: He had Tommy John surgery in 2010, but it seems he has worked his way back from that. Sophomore slump or hitters figuring him out are worth considering. And of course, a couple fly ball outs might turn into home runs.

Fearless prediction: 32 games, 210 IP, 2.80 ERA, 1.05 WHIP, 234 Ks (10 k/9) – and deGrom finally gains some respect withing the fantasy baseball community as a top-15 fantasy pitcher. That bold prediction being said, I think he’s being criminally underrated in fantasy drafts, with his ADP of 112 in yahoo leagues.

112! At that price, go ahead and reach.


Don’t Hate Dee Because He’s Beautiful

I have every reason to hate Dee Gordon.

Prior to the 2012 season, I found myself struggling to figure out who would get the final keeper slot in a longtime, highly competitive fantasy league I played in. It came down to two players: Mike Trout and Dee Gordon. They both would have cost me the same, but Gordon was coming off a rookie campaign where he batted .304 with 24 steals in a miniscule 224 at-bats. Trout, on the other hand, was heading into 2012 with what seemed to me like a more clouded future. He had just posted a pedestrian .671 OPS with a 22.2 K%–albeit as a 19-year old–the year prior. He was also blocked in LF at the time by the great Bobby Abreu, and was looking at possibly another year of seasoning in the minors. In the end I chose Gordon, and the rest is terrible, nightmare-inducing history.

So how strange that I find myself here now, defending Dee Gordon, the very man who hoodwinked me into choosing him over Mike mother-flippin’ Trout.

Ironically, I think the hate for Gordon has gone a bit too far this year. It’s odd to think that there’s any hate for a guy coming off a season where he led all of baseball in steals while also posting a top-25 batting average of .289. But some people seem awfully down on the guy coming into 2015. Perhaps they too were burned by his 2011 breakout, and refuse to make the same mistake twice. Though I can’t fault them if that is the case, there is reason to believe that Dee Gordon’s days of breaking our hearts are over.

Gordon's Batted Ball Percentages 2014

The first thing to point out are his batted-ball rates. As the graph illustrates, there weren’t any earth-shattering changes occurring here. It is worth noting, however, that Gordon set a career high in groundball percentage and a career low in fly-ball percentage. And if you’re willing to consider 2013 an aberration like I am (he only managed 106 plate appearances that year), he has actually been gradually trending in the right direction with both his fly-ball and groundball percentages while maintaining a fairly steady line-drive rate. Spikes in groundball percentages are rarely considered ideal, but when a player has the elite speed Gordon does, the odds of turning a weak dribbler or a grounder towards the hole into a hit get a very favorable bump.

Which brings me to perhaps the most eyebrow-raising aspect of Gordon’s 2014 season: his bunt-hit percentage (BUH%). After averaging a 28.5 BUH% over the prior three seasons, Gordon posted a ridiculous 42.6 BUH% in 2014. To put that number into perspective, here’s how it stacked up against the league’s other elite speedsters:

2014 BUH% Among Elite Speedsters

Bunting for hits is a skill. The fact that his success rate rose by nearly 15% last year tells me that he worked on and dramatically improved this skill. Perhaps more importantly, though, it tells me that he’s keenly aware of how dangerous a weapon this skill can be for him when used effectively. When paired with his declining fly-ball rates–and especially his new career low IFFB% of 8%, down from 13.2%–the numbers start to paint the picture of a player who may have finally begun to consciously tailor his plate approach to his strengths.

While I will never forgive Dee Gordon for what he did to me, I do see reasons to be optimistic about his 2015 season. Should his elite ability to bunt for hits carry over into this season, his .346 BABIP shouldn’t see as much regression as people seem to think, and another year of plus average and a stolen-base crown seems well within his reach.


Brandon Inge, Superstar

Brandon Inge, Superstar.

How many wins is chemistry worth? Do nice guys really finish last?

As a Pirates fan since birth, I’ve grown used to my baseball fandom engendering a sense of sympathy in others. Born in 1989, I came of baseball-loving age in the mid-nineties, immediately following the halcyon Bonds/Bonilla/Van Slyke & co. days and immediately preceding the less-halcyon days of the Aramis Ramirez-for-Bobby Hill trade, “Operation Shutdown,” the expansion-drafting of Joe Randa, Pat Meares’ general existence, the Moskos pick, the Matt Morris trade . . . (list of soul-crushingly depressing baseball stories truncated for reader’s mental health).

And yet I remained faithful, despite having no conscious memory of a Pirates team being anything other than heartbreakingly awful. I’ve since likened this experience, in conversations with friends, to Linus sitting in the pumpkin patch each year, waiting for the Great Pumpkin to appear. It sometimes seemed that the Great Pumpkin would never come.

It’s ironic, then, that in the year that finally saw the Great Pumpkin arrive in Pittsburgh (2013), the same city also witnessed the end of the career of one Charles Brandon Inge.

Inge, nicknamed ‘Cringe’ by some of the crueler Pittsburgh faithful for his anemic .181/.204/.238 batting line during the 2013 campaign, was at that point in his thirteenth season as one of baseball’s premiere utility men, playing every position on the diamond during his career. During his peak, he was a slick-fielding third baseman who also clubbed 27 HRs en route to a 4.1 fWAR season in 2006. But by 2013, Inge was 36 and on his way out of the league. Signed before the season to provide depth behind Pedro Alvarez and Neil Walker, Inge’s poor performance eventually led to his unceremonious release by the Pirates at the end of July.

And yet, this article has less to do with Inge’s on-field merits (which, as the previous paragraph suggests, were both significant and significantly variable), and more to do with Inge’s impact off the field. Inge won the 2010 Marvin Miller Man of the Year Award, given to the player whose “performance and contributions to his community inspire others to higher levels of achievement,” for his work with C.S. Mott Children’s Hospital. A frequent visitor to C.S. Mott, Inge also donated $100,000 for a new infusion center to treat pediatric cancer and twice hit home runs for young cancer patients. Dude’s a nice guy.

Perhaps more relevant, though, is pitcher and noted stathead Brandon McCarthy’s statement that Inge and fellow veteran Jonny Gomes had been worth twenty-four wins to the 2012 Athletics through chemistry alone. Normative ethics aside, it’s impossible to measure the moral character of a man—but we can measure, or at least attempt to quantify, the impact he has on his teammates.

Intrigued, I set out to determine whether Inge, patron saint of chemistry and all-around good guy, really made such a gigantic difference to his teammates’ performance. Mine is not the first investigation into this topic—Baseball Prospectus’ Russell A. Carleton examined the same issue in March of 2013, and there have been numerous attempts to place a valuation on chemistry over the years. But as you’ll see, there are some methodological differences to our approaches, and the differences expose some interesting conclusions.

Methodology

There is no ironclad way to assess Inge’s potential effect on his teammates, short of cloning entire teams of players, randomly assigning Brandon Inges to some of them, and having them play a large number of seasons.

In order to determine Inge’s value as accurately as possible, I can’t simply measure his teammates’ performance—I’d just be concluding that Inge played with good or bad teammates. Instead, I need to develop a counterfactual, or a method of estimating how we could’ve reasonably expected Inge’s teammates to play in his absence. Fortunately, an excellent one already exists—a ZiPS projection. ZiPS, to my knowledge, does not have a ‘played with Brandon Inge variable,’ so it should be unbiased. Carleton instead used an AR(1) covariance matrix to try to adjust for player talent, but given that ZiPS explicitly incorporates past performance with a view to projecting, as accurately as possible, how a player will perform in the upcoming season, I believe it is a suitable tool.

I chose wOBA as the dependent variable for our study—while Carleton looked at multiple indicators (BB%, K%, etc), one, all-encompassing measure of players’ offensive performance seems best suited to answering the question, “Do players perform better with Brandon Inge on their team?”

In order to develop the requisite dataset for this analysis, I downloaded every player-season since 2006[1] from FanGraphs’ leaderboards and filtered the data to include only those players who amassed at least 200 plate appearances. This yielded 3130 player-seasons. Next, I created a binary variable called ‘IngeTeammate,’ with a value of ‘1’ if the player was on Inge’s team during the given season (and not Inge himself), and ‘0’ if he wasn’t. For the 2012 season, the only one in which Inge played for multiple teams, I counted Inge as having played for the Athletics, with whom he spent the majority of the season.

The next part was a bit tricky—bringing in the ZiPS projections. The latest years, the ones for which ZiPS has been featured on FG, were easy—data was readily available, wOBA already calculated, and records already associated with a player id. But wading deeper into the past unearthed some issues—in order to match records, I had to manually match player names (including the two Chris Carters, and, apparently, two Abraham Nunezes . . . Nunezii . . . who knows?) and hand-calculate ZiPS-projected wOBA for older player-seasons using the weights provided on the FanGraphs Guts page. One potential issue with some of the oldest data is the lack of projections for things like intentional walks and sacrifice flies.

However, forging through all of the record-matching and manual wOBA-calculating eventually yielded ZiPS wOBA projections matched to 3088 of the 3130 player-seasons. Of the 42 unmatched seasons, only one was an Inge teammate (2010 Brennan Boesch). 81 of the 3088 matched seasons were Inge teammates. So unless you think ZiPS would have pegged Boesch, a relatively unknown 25-year-old at the time, for a significantly better performance than the .322 wOBA he posted in 2010, the unmatched records probably didn’t have a huge effect.

What we’re left with is data that look like this:

Year Name Team Age PA IngeTeammate ZiPS wOBA wOBAdiff wOBA
2010 Jose Bautista Blue Jays 29 683 0 0.322 0.100 0.422
2010 Jim Thome Twins 39 340 0 0.343 0.096 0.439
2010 Wilson Betemit Royals 28 315 0 0.302 0.084 0.386
2010 Josh Hamilton Rangers 29 571 0 0.365 0.080 0.445
2010 Chris Johnson Astros 25 362 0 0.286 0.067 0.353
2010 Carlos Gonzalez Rockies 24 636 0 0.350 0.063 0.413
2010 Justin Morneau Twins 29 348 0 0.387 0.061 0.448
2010 Paul Konerko White Sox 34 631 0 0.361 0.056 0.417
2010 Joey Votto Reds 26 648 0 0.383 0.055 0.438
2010 Danny Valencia Twins 25 322 0 0.299 0.052 0.351
2010 Giancarlo Stanton Marlins 20 396 0 0.305 0.051 0.356
2010 Miguel Cairo Reds 36 226 0 0.288 0.051 0.339
2010 Will Rhymes Tigers 27 213 1 0.288 0.050 0.338
2010 Tyler Colvin Cubs 24 395 0 0.301 0.050 0.351
2010 Michael Morse Nationals 28 293 0 0.328 0.049 0.377
2010 Adrian Beltre Red Sox 31 641 0 0.343 0.048 0.391
2010 Ryan Hanigan Reds 29 243 0 0.321 0.048 0.369
2010 Yorvit Torrealba Padres 31 363 0 0.279 0.044 0.323
2010 Matt Joyce Rays 25 261 0 0.321 0.043 0.364
2010 Aubrey Huff Giants 33 668 0 0.344 0.043 0.387
2010 Drew Stubbs Reds 25 583 0 0.295 0.043 0.338
2010 Andres Torres Giants 32 570 0 0.316 0.042 0.358
2010 Corey Patterson Orioles 30 341 0 0.274 0.042 0.316
2010 Austin Jackson Tigers 23 675 1 0.288 0.041 0.329
2010 Brett Gardner Yankees 26 569 0 0.306 0.040 0.346
2010 Colby Rasmus Cardinals 23 534 0 0.329 0.040 0.369
2010 Andruw Jones White Sox 33 328 0 0.323 0.039 0.362

In the above table, wOBAdiff refers to the amount by which the player outperformed his ZiPS wOBA projection. A negative number would indicate that a player underperformed his projection. So Jose Bautista outperformed his 2010 projection by .100—multiplying by 1000 tells us that this was 100 points of wOBA. It was good to be Joey Bats in 2010.

Results

If we look at the mean wOBA deviation (in terms of points of wOBA) Inge teammates and non-teammates experienced from their ZiPS projections, we see the following results:

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3007 1,378,732 -3.09 -4.62
Teammate 81 37,965 4.30 4.24

In other words, if we weight by plate appearances, Inge teammates outperformed their ZiPS projections by an average of about 4.30 points of wOBA. All other players underperformed their projections by an average of about 3.09 points. Which might not seem like a lot, but if you were to apply that 7.4 wOBA difference to an average-hitting team over a 6000 PA team-season, that’s roughly 34 runs. So 3.4 wins. Which is, you know, quite a bit. The unweighted version is even more extreme, suggesting that players with lower numbers of PA have outperformed their projections even more when teamed with Inge.

If we simply run a regression including the independent variables IngeTeammate (binary) and age and the dependent variable wOBAdiff (unweighted), we can express the story another way:

wOBAdiff = 0.0127064 + (IngeTeammate* 0.0090544) + (age* -0.0005993)

I included age as a control because ZiPS projections, as you can see from the model above, tended to slightly overproject older players in comparison to younger players, and therefore I needed to consider the possibility that Inge simply benefitted from playing only with young players (he didn’t).

Note that in the model above, 0.001 corresponds to one point of wOBA (i.e. a hitter moving from .323 to .324 would have gained a point of wOBA). The r-squared of the model is absurdly low (0.006), but that’s to be expected—after all, I’m not trying to assert that Brandon Inge is responsible for all or even a significant part of the variation between MLB players’ expected and actual performance. More importantly, the variable ‘IngeTeammate’ is significant at a 98.4% threshold.

Considering the possible influence of aging is interesting, as the Inge difference is even more pronounced among younger players, or those whom he allegedly mentored while playing with the A’s. If we filter the data above to include only players 27 and younger, the table looks like this:

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 1241 568,944 -0.50 -2.09
Teammate 30 14,298 16.58 17.27

We’re starting to run into some serious sample size issues that make me uncomfortable drawing any particularly bold conclusions, but young players who play with Inge have done really, really well, collectively knocking the snot out of their ZiPS projections. There are problems with extrapolating this to a 6000 PA team-season, given that presumably an entire team won’t be composed of young players, but if one did so the result would be a ridiculous 78.6 runs of additional value.

The table below lists every 27-and-under player season for which the player was an Inge teammate:

Year Name Team Age PA ZiPSwOBA wOBAdiff wOBA
2008 Matt Joyce Tigers 23 277 0.275 0.084 0.359
2011 Alex Avila Tigers 24 551 0.308 0.076 0.384
2013 Jordy Mercer Pirates 26 365 0.282 0.051 0.333
2010 Will Rhymes Tigers 27 213 0.288 0.050 0.338
2012 Chris Carter Athletics 25 260 0.319 0.050 0.369
2011 Brennan Boesch Tigers 26 472 0.300 0.048 0.348
2007 Curtis Granderson Tigers 26 676 0.344 0.044 0.388
2010 Austin Jackson Tigers 23 675 0.288 0.041 0.329
2012 Yoenis Cespedes Athletics 26 540 0.328 0.040 0.368
2010 Miguel Cabrera Tigers 27 648 0.399 0.032 0.431
2013 Jose Tabata Pirates 24 341 0.308 0.032 0.340
2012 Josh Reddick Athletics 25 673 0.296 0.030 0.326
2013 Andrew McCutchen Pirates 26 674 0.365 0.028 0.393
2013 Starling Marte Pirates 24 566 0.317 0.027 0.344
2006 Omar Infante Tigers 24 245 0.306 0.016 0.322
2008 Curtis Granderson Tigers 27 629 0.358 0.015 0.373
2009 Clete Thomas Tigers 25 310 0.302 0.015 0.317
2012 Josh Donaldson Athletics 26 294 0.286 0.014 0.300
2011 Andy Dirks Tigers 25 235 0.297 0.011 0.308
2013 Neil Walker Pirates 27 551 0.328 0.005 0.333
2013 Pedro Alvarez Pirates 26 614 0.327 0.003 0.330
2006 Curtis Granderson Tigers 25 679 0.335 0.000 0.335
2009 Miguel Cabrera Tigers 26 685 0.407 -0.005 0.402
2010 Alex Avila Tigers 23 333 0.306 -0.007 0.299
2011 Austin Jackson Tigers 24 668 0.315 -0.010 0.305
2012 Jemile Weeks Athletics 25 511 0.304 -0.028 0.276
2012 Derek Norris Athletics 23 232 0.304 -0.029 0.275
2006 Chris Shelton Tigers 26 412 0.380 -0.033 0.347
2013 Travis Snider Pirates 25 285 0.310 -0.039 0.271
2008 Miguel Cabrera Tigers 25 684 0.419 -0.043 0.376

It’s not as if one year is hugely skewing the results—pretty much every year, whichever young players happen to be playing with Brandon Inge outperform their projections. The graph below illustrates the mean wOBA differential younger Inge teammates exhibited each season. I would’ve imagined, prior to viewing these results, that Inge’s positive ‘effect’ might’ve been almost entirely a product of the 2012 Athletics, but this doesn’t seem to be the case—outside of the 2006 Tigers (when Omar Infante, Curtis Granderson, and Chris Shelton collectively underperformed their ZiPS projections by a modest average of ~5 points of wOBA), Inge’s younger teammates have outperformed ZiPS every single year in the sample.

Perhaps, one could say, Inge has simply benefitted from playing on teams run by intelligent front offices. After all, the Tigers, Athletics, and (more recently) the Pirates all have reputations as relatively savvy management teams. Maybe they’re just collectively able to out-forecast ZiPS.

When we look at ZiPS wOBA differentials by team, however, the Tigers (+1.36 points of wOBA), Athletics (+0.11) and Pirates (-0.31) all had weighted mean differentials less than the Inge gap. The average over all teams was -2.89, so while all three front offices ‘beat the market,’ so to speak, they still don’t explain the huge Inge effect. It looks as though there’s something here.

After observing the results for Inge, I was curious about whether other veteran players might also exhibit similar correlations—while we’d expect to find no correlation with ZiPS wOBA differential for most players, it might be the case that, as with Inge, patterns emerge. Specifically, I looked at two players with diametrically opposite reputations—A.J. Pierzynski and Jonny Gomes. Below, I replicate the initial summary table used for the Inge analysis and note the magnitude of the effect:

A.J. Pierzynski

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3004 1,375,450 -2.75 -4.29
Teammate 84 41,247 -7.65 -7.87

The game’s most hated player didn’t fail to disappoint, as his teammates collectively underperformed their ZiPS projections by an additional of 4.9 points of wOBA when compared to non-teammates, an effect worth -22.6 runs to the team over the course of a full season. I should note that I assigned Pierzynski to the 2014 Red Sox (with whom he spent considerably more time) instead of the 2014 Cardinals—both teams underperformed their ZiPS projections, but the Red Sox did so by a larger margin.

Pierzynski’s unweighted results, while still negative, are less damning, and using a regressed model reflects this:

wOBAdiff = 0.0128794+ (AJTeammate* -0.0033689) + (age* -0.0005939)

The intercept and coefficient for age are, understandably, almost identical to those I observed in the Inge model. The significance level for AJTeammate, however, is only 64.1%, suggesting that we can’t really conclude much of anything with the same level of confidence as for Inge.

Still, twenty-plus runs is a non-negligible amount, and Pierzynski’s numbers have been negative across all four teams for whom he’s played (White Sox, Rangers, Red Sox, Cardinals). It may be that more historical data would reveal a broader trend, given that we’ve limited our sample size to only the latter half of Pierzynski’s career.

Jonny Gomes

  Player-Seasons Total PA Mean Weighted Diff. (wOBA pts)
Mean Unweighted Diff. (wOBA pts)
Non-Teammate 3000 1,376,613 -3.05 -4.56
Teammate 88 40,084 2.58 1.52

The phenomenally-bearded Gomes, Inge’s running partner in the Brandon McCarthy quote that triggered this analysis, also appears to be a potential chemistry star, though his results are less extreme than Inge’s. His teammates outperformed non-teammates by 5.6 points of wOBA, worth an estimated 26 runs per season.

wOBAdiff = 0.0124387+ (GomesTeammate* 0.0055032) + (age* -0.0005873)

The effect, as with Pierzynski, is not statistically significant—the significance level is 87.4%.

Conclusions

We can’t make firm statements about causality from this analysis, but we can say pretty conclusively that being on the same team as Inge during the last nine years correlates positively with hitting better than ZiPS projects you to hit.

Maybe you don’t believe Inge should get credit for the extra 3.4 wins of value each year. We don’t have a ‘chemistry above replacement’ metric to account for the fact that some other player with a modicum of veteranosity might plausibly have a positive effect if analyzed the same way. And there’s no feasible way to develop one on the horizon—you can only start to do this sort of analysis retrospectively, and it requires a large number of plate appearances and player-seasons before we can conclude that any pattern has emerged. I’m not really arguing that Inge deserves all the credit for his teammates’ overperformance, only that we have reason to believe a nonzero effect may exist.

But let’s entertain, for a minute, the possibility that the 3.4 win-per-season gap we see *is* entirely attributable to Inge. That maybe all the minute, unnoticed interactions between players over the course of a season can add up to improved performance at the plate. The effect could even be greater than 3.4 wins—I didn’t examine pitching and fielding at all. After all, everything we know about human psychology suggests that happier workers are more productive, and I’ve yet to hear any compelling reason that ballplayers constitute an exception. We sometimes, in the analytics community, fall into the trap of assuming that because we can’t measure something accurately, it doesn’t deserve a meaningful place in our analysis. And yet our inability to measure a phenomenon is not proof of its nonexistence—just ten years ago, we lacked meaningful metrics for catcher framing, for instance.

Perhaps Inge contributed more hidden value over the last decade than anyone this side of Jose Molina, and Brandon McCarthy’s twenty-four wins were, if still hyperbole, grounded in a subtle truth. 3.4 wins currently has a market value north of $20M, making Inge a substantially underpaid man over the course of his career.

It’s a shame, on some level, that it’s only after he’s retired that we recognize the unheralded Inge for who he might secretly have been: Brandon Inge, Superstar.

 

[1] Before 2006, I struggled to find ZiPS projections in a readable format to develop the counterfactuals.

Data retrieved from FanGraphs and Baseball Think Factory.


The Most Signature Pitch of 2014

If you were feeling charitable, you could say this post owes a lot to Jeff Sullivan’s recent set of articles examining pitch comps. If you weren’t feeling charitable, you could say this post is a shameless appropriation of his ideas. Either way, you should read those articles! They were very good, and very entertaining, and directly inspired this post. There were seven, in total: here, here, here, here, here, here, and here. I’ll wait.

Back? Good! In the comments of the third article, someone asked Jeff about finding the “most signature” pitch, or the pitch with the worst/fewest comps. Jeff said: “Wouldn’t be surprised if it was Dickey or the Chapman fastball. That math… I’m afraid of that math, but I might make an attempt.” Jeff has looked at unique pitches twice (Carlos Carrasco’s changeup and Odrisamer Despaigne’s changeup, the last two articles linked above), but I wanted to attack the question in a less ad-hoc fashion, looking at all pitches rather than singling some out.

Jeff wasn’t wrong, though – the math is not simple. His methodology doesn’t really work here for a couple reasons. First of all, I’m looking for uniqueness rather than similarity. I could just flip Jeff’s method around and look for high comp scores, like what he did for the Carrasco/Despaigne changeups, but I also want to consider all pitch types. Again, Jeff sort of did this in the Despaigne article, by comparing his changeup to a few different pitch types, but that is not really feasible for every pitch thrown.

What this means is that a new method is needed to directly calculate dissimilarity. We could find the maximum distances from the mean (basically Jeff’s method), which would work for a single pitch type: if all the pitches are clustered together, with similar velocities and breaks, calculating the distance from the mean to find the weirdest pitch makes sense. But consider this hypothetical set of pitches, graphed on two axes for simplicity:

hypothetical pitches

Obviously, the pitch that corresponds to the red point is the sort of thing we’d like to identify as unique. It’s also exactly at the center of that dataset, and would show up as the least unique pitch, if distance from the mean was used to determine uniqueness. Luckily, there’s an algorithm that is designed to find outliers in a more rigorous way.

This is where the math gets scary. The algorithm is called Local Outlier Factor analysis, which identifies outliers in a dataset based on the density of data around that point as compared to its neighbors. In this context, the density around a point is a function of how similar the best comps are for each pitch. Each point gets a score, where anything near 1 indicates normal, and higher values indicate greater isolation. I’m not going to go into detail, but if anyone wants to learn more, feel free to ask in the comments, or just Google it. It’s fairly simple to run it on all pitches, with the relevant variables of velocity, horizontal break, and vertical break.

Any pitch thrown more than 100 times in 2014 was included, and righties and lefties were considered separately (since pitches that move the same way obviously are very different based on what side of the rubber they come from). But enough about methodology! Here are the top five most signature pitches, for righties and lefties, along with their LOF scores, followed by some gratuitous gifs.

RIGHTHANDERS

Name Pitch Velocity H.Mov V.Mov Outlier Score
R.A. Dickey Knuckleball 76.6 0.2 1.6 2.26
Mike Morin Change 73.7 2.0 5.7 2.16
Steven Wright Knuckleball 74.2 0.7 0.3 2.13
David Hale Fourseam 91.9 4.2 5.8 2.04
Pat Neshek Change 70.9 7.0 3.5 1.00

LEFTHANDERS

Name Pitch Velocity H.Mov V.Mov Outlier Score
Aroldis Chapman Fourseam 101.2 3.7 11.1 2.53
Erik Bedard Slider 73.6 2.0 4.1 2.19
Sean Marshall Curve 74.4 9.5 -6.7 1.91
Dan Jennings Fourseam 93.6 4.9 5.8 1.86
Zach Britton Sinker 96.2 8.6 4.7 1.85

 

 

Chapman fastball

It’s nice when things work exactly like you expect them to. The top pitches on the two lists are incredible, and incredibly unique, and while it’s not a surprise to see them here, it does provide some reassurance that this measure is doing what it’s supposed to. Everyone knows about Dickey’s knuckleball, and if anything, it’s underrated by this measure. Since it moves so randomly, the knuckle’s season averages end up being slow and pretty much neutral horizontally and vertically. While that’s enough to make them show up as very odd under this measure, the individual pitches don’t often follow that straight trajectory, as seen in the above gif. The same can be said for Steven Wright’s knuckleball in third, but it’s nice that this measure still picks them out as unique pitches.

As for Chapman, there’s not that much to say about his fastball that hasn’t already been said. It feels wrong in some way to call his fastball strange, since it is disturbingly direct in practice, but there was truly no pitch like it in 2014. The velocity is the carrying factor behind the massive outlier score, almost a full 2 MPH greater than the next fastest pitch. Interestingly, Chapman’s pitch was the only one in either top five with notably high velocity.

Looking at the weirdest pitches in baseball, what can we conclude about them as a group? First, the pitchers throwing them are generally not bad. While you’d expect someone to be at least halfway decent to get in the position to throw 100 pitches of a single type, the owners of these pitches averaged about 1 WAR in 2014. With eight of these 10 throwing primarily in relief, and having only 710.2 innings collectively, that comes out to a very respectable 2.4 WAR/200.

The pitches themselves varied in usage, from Neshek’s change, thrown 13.4% of the time, to Britton’s sinker, thrown 89.3% of the time. They also varied in effectiveness, as measured by run values, from Neshek’s 3.6/100 to Marshall’s -1.63/100. Overall, the best pitch is probably Chapman’s fastball, followed by Britton’s sinker, given both the results on those pitches and how often they use them, but as a group, these pitches are pretty good. Maybe that isn’t totally surprising, but weird does not necessarily equal effective. Any pitcher could immediately have the weirdest pitch in baseball, if he threw 40 MPH meatballs, but less absurdly, mix and control matter just as much as the movement of the pitch.

Finally, all this stuff tracks fairly well with what Jeff identified previously. Obviously, he called Dickey and Chapman, but he also wrote this article about how Zach Britton’s sinker is pretty much comp-less, and we see that very pitch in fifth for lefthanders. Odrisamer Despaigne’s change was 12th for righthanders. Interestingly, Carrasco’s change is 98th on that same list, indicating this method doesn’t think he’s incredibly unique. Overall, this was mostly just a fun exercise, but maybe there’s more to this list, so if you want to poke around, it’s in a public Google Doc here. And like I said, if you have any questions about the methodology or anything like that, I’d be glad to answer them in the comments.


The Horrors of Jackie Bradley Jr.’s 2014 Season

Jackie Bradley Jr. is not a terrible baseball player, and honestly he probably didn’t have a terrible 2014 season. Well at least, it wasn’t as bad as what people perceived. That, however, is due to his impeccable fielding and good baserunning. What follows will include none of that. It is rather a complete and utter breakdown of Bradley’s hitting performance, for 2014, and the trends he displayed. They, as you might have guessed, are not pretty.

First, it seems important to mention that Bradley’s numbers were great in the minors. Not just fielding but hitting as well. After A ball (Greenville), Bradley never had a wRC+ below 120 and he never had a BB% lower than 10%. Now BB% is not always predictive, as Chris Mitchell has displayed through his KATOH metric. KATOH, however, does show that BB% is predictive in AA and AAA, and Bradley’s BB% was good in AA and AAA.

Now on to 2014. This was suppose to be Bradley’s big break, it was supposed to be his year, he was going to replace Jacoby Ellsbury in center, and become the next great Red Sox center fielder. None of that happened; Bradley did play good defense but his offense was atrocious, finishing with a 47 wRC+.

So what happened? How did a player lauded for not just his defense but also his hitting ability, finish the year with a 47 wRC+? First, let’s acknowledge that hitting is extremely difficult, especially at the major-league level. There are also many components that go into hitting and all of them have an impact on a why a hitter hits a certain way. It’s also important to look at how pitchers work a hitter, and I think that’s where will start. Below is a graph, of the hard, breaking, and off-speed pitches Bradley faced in 2014.

lql

From this, it’s pretty evident that pitchers predominantly attacked Bradley with fastballs. This was after all his first major league season, and pitchers will often test young hitters or rookies with fastballs. If the hitter starts to hit the fastball well, then typically a pitcher will make an adjustment. As you can see, no adjustments were made because no adjustments were needed.

Now that we know what pitchers were throwing at Bradley, lets look at what Bradley did with those pitches. The graph below will display the outcome of Bradley’s at-bats in 2014.

poppp

This is where my eyes started to hurt. Bradley, as you can see, got off to a good start, but everything fell off quickly after that. In fact, things fell apart so badly that Bradley didn’t get a single extra base hit in the last two months of the season. While I like this graph, in explaining Bradley’s struggles, I think the pie chart below will give you an even better example of just how bad Bradley was in 2014. The graph was provided by Baseball Savant.

iiiiii

There are many outcomes that can come from a pitch: a foul, a whiff, a called strike, a ball, a ball in play, and finally a hit. Bradley, got a hit considerably less often than any other outcome. This is not a recipe for success. Hold on, let me clarify that. The fact that Bradley’s hits were his most infrequent outcome was not the problem. Mike Trout’s most infrequent outcome after all was his hits. The problem was Bradley’s 4.6 hit%.

Another problem here is that Bradley was simply not putting the ball in play enough, and the balls in play, unfortunately, were not resulting in enough hits (.284 BABIP). This, however, is only one of the problems. To get a better understanding of why Bradley didn’t get enough hits, it seems imperative that we examine where Bradley was hitting the ball. For this, we’ll look at a spray chart provided by Brooks Baseball, to examine exactly where Bradley was hitting the ball, and if there are any consistent trends.

rrrrr
Here are the outcomes when Bradley put the ball in play. What is distinctively clear is that Bradley pulled the ball a lot, especially in the infield. He also doesn’t seem to have been hitting a lot of hard ground balls, which would explain his lack of hits in the infield. As you can see Bradley over a full season of baseball only mustered four hits in the infield and none the other way. The Red Sox have talked about working on Bradley’s swing, they’ve suggested that his swing is too uppercut-y and he needs to start swinging down on the baseball. From this chart it seems pretty evident why they want to do that. They probably want Bradley to be able to hit the ball the other way, not just in the air but also on the ground, as to maximize his ability to get hits.

While fixing a swing is important, it’s only one of the problems. There are more elements that go into hitting and someone doesn’t end up with a 47 wRC+ without some kind of approach problem. This is where we’ll take our final investigation, into Bradley’s plate approach and the tendencies he’s been displaying.

There are a few factors and components that can be attributed to a hitter’s approach. One of them is the hitter’s tendency to swing. The more one swings the less he is likely to be a patient hitter, and the less likely he is to have a good approach at the plate. Below is a graph of Bradley’s month-by-month swing percentage on hard, breaking, and off-speed pitches for 2014.

eeeeee

This as you might be able to tell is not good. Bradley’s tendency to swing got gradually worse as the year went on. This meaning that as the year went on Bradley either got further away from his approach or he simply got frustrated. Let’s not panic, however, just because a hitter has a high swing% doesn’t mean that he can’t be a successful hitter, especially if he makes contact on a lot of his swings. Vlad Guerrero was a great hitter and he swung at everything; he also hit everything. So let’s look at Bradley’s whiffs per swing (whiff/swing). Why? Well because if you’re swinging a lot, you don’t want to have a low whiff per swing rate because it means that most of the pitches you’re swinging at aren’t going to become hits. It also probably means you’re striking out a lot and that you’re chasing a lot of pitches.

ssssss

As you might be able to tell, Bradley in 2014 swung and missed a lot. I think it’s also important to note that in the last two months of the year, Bradley’s plate appearances were significantly reduced. He only got 35 plate appearances in August and only 36 in September. So while it might seem that in the last month, Bradley started swinging and missing less, that was in a very small sample size.

Finally, lets look at Bradley’s overall plate approach tendencies. What follows is a chart provided by Brooks Baseball that examines a players overall plate approach. It examines, through the use of PITCH f/x data his passiveness and his aggressiveness at the plate. It does this through the use of detection theory, which analyses the decisions one makes in face of uncertainty. There are essentially two parameters to detection theory, C and d’. C, which is the one used for this graph, reflects the strategy of the response. Ok, that’s enough on the subject.

tototot

Just like Bradley’s swing tendencies, his overall plate approach was going in the wrong direction. Throughout the year, Bradley had consistently gotten more and more aggressive. He’s essentially lost what made him a successful hitter in the minors. These are the signs that probably made the Red Sox sign Rusney Castillo from Cuba to a seven-year deal. It also might be a reason why the Red Sox are in serious talks with the Braves about a potential trade involving Bradley.

That being said, while it is certain that Bradley’s tendencies and approach were all heading in the wrong direction, this doesn’t mean that he can’t turn things around. Players make adjustments all the time, and I’m not sure that these stats are necessarily predictive of future performance. Baseball after all is a game of adjustments, pitchers make adjustments on hitters, and then the hitters counter with their own adjustments. It doesn’t seem that Bradley will ever be a great hitter or even a good hitter, but what he can be is a league-average hitter. I’ve spent a lot of time discussing Bradley’s offense and not nearly enough on his defense. Bradley is a great defensive center fielder, maybe the best, and that has real value. If Bradley can simply become an average hitter, he should have a spot in the majors for many years to come.

All graphs can be found on Brooks Baseball and the circle graph on Baseball Savant. A lot of the stats can also be found on FanGraphs.  


Delayed Overanalysis of Casey Janssen

The Nats signed reliever Casey Janssen, formerly of the Blue Jays, to a one-year, $5-million contract a few weeks ago (feel free to stop reading now to avoid the existential dread associated with over-analyzing Casey Janssen). Overall, it’s hard not to like this pick-up. One year and five million dollars is basically nothing (except when it comes to signing a second baseman), and the Clippard trade certainly left a hole in the bullpen. There was also a recent stretch of time when Janssen was quite good. From 2011-2013, Janssen averaged 57.1 IP, 8/9 strikeouts per 9 innings, and a sub 3 FIP. WAR isn’t the best way to measure relievers, but he averaged 1.2 WAR a season over those three years, which put him squarely in the pretty damn good category of relief pitchers.

So why did a recently good closer sign for a seemingly below market sum? Because 2014 was mostly terrible. Strikeouts were way down (5.5 Ks/9 in 2014 compared to 8.5 in 2013), Homers were way up (1.2 HR/9 in 2014 compared to 0.5 in 2013), and his groundball percentage dropped from 48% to 34%. These are all fairly alarming trends for a relief pitcher that is 33 and doesn’t throw very hard (2014 average fastball velocity: 89.3 miles per hour). Every analysis for relief pitchers should contain small sample size warnings in all capital letters, but important indicators trending that strongly generally indicate something wrong happening.

In July of last season, Janssen came down with a particularly awful bout of food poisoning, and he probably came back too quickly. And looking at the mid-season splits, there’s a case to be made that it was the (negative) turning point for the rest of Janssen’s season. Let’s compare:

1st half: 22 IP, 1.23 ERA, 0 HR, 14 Ks, 1 BB, .218 wOBA against
2nd half: 23.2 IP, 6.46 ERA, 6 HR, 14 Ks, 6 BBs, .378 wOBA against

In the first half, Janssen made opposing hitters look like Austin Kearns. In the second half, they all looked like Yasiel Puig. His numbers did take a nosedive in July when he was sick, but got worse in August when one would have expected him to be feeling better (or put on the DL to recuperate). It’s impossible for anyone to really know how he was feeling, and if food poisoning actually was the main cause of Janssen’s second-half struggles. But, his velocity didn’t change from the first half to the second half, and his strikeout rate remained about the same. The uptick in walks and home runs in the second half are troubling, but maybe first-half Janssen was a fluke based on a year over year decrease in velocity (lost about .8 MPH on his fastball from 2013 to 2014)  and a decrease in strikeouts. For comparisons sake, here is an unnamed reliever’s 1st and 2nd half splits in 2014:

1st half: 37 IP, .97 ERA, 1 HR, 36 Ks, 11 BBs, .208 wOBA against
2nd half: 25 IP, 6.48 ERA, 3 HR, 23 Ks, 8 BBs, .375 wOBA against

This reliever? Rafael Soriano. There wasn’t an injury narrative to fault for his falling off a cliff bad second half, but he stunk nonetheless. Screwy things can happen in small samples, which is why we try to avoid over-analyzing them. Janssen may have just had impeccable timing, and his new true talent level as a command relief pitcher is that of a 4.00 ERA. But unlike with Soriano, there is a realistic narrative for Janssen that fits the timeline of his struggles. Here’s another 1st half/2nd half comparison

1st half:

2nd half:

While his K rate was basically the same from the first half the second half, these charts show that his whiff rates weren’t. Janssen had much more success both down and up in the zone earlier in the season in terms of swings and misses, so while his velocity was the same between the first half and second half, it appears that his stuff wasn’t.

Again, in such small samples, it’s impossible to draw any definitive conclusions. It’s true that first-half Janssen looked pretty similar to 2011-2013 Casey Janssen, while second-half Janssen looked more like Brian Bruney. It’s reasonable to look at the splits and say that Janssen’s bout with food poisoning ruined what looked to be a promising season. It’s also reasonable to look at his decrease in velocity and strikeout rate and think this was money not well spent. But for a paltry (in the context of the MLB) five million dollars, it’s not that much money anyways, so why the hell not?


The Home Run Derby and Second Half Production: A Meta-Analysis of All Players from 1985 to 2013

The “Home Run Derby Curse” has become a popular concept discussed among the media and fans alike. In fact, at the time of writing, a simple Google search of the term “Home Run Derby Curse” turns up more than 180,000 hits, with reports concerning the “Curse” ranging from mainstream media sources such as NBC Sports and Sports Illustrated, to widely read blogs including Bleacher Report and FanGraphs, to renowned baseball analysis organizations like Baseball Prospectus and even SABR.

This article seeks to shed greater light on the question of whether the “Home Run Derby Curse” exists, and if so, what is its substantive impact. Specifically, I ask, do those who participate in the “Home Run Derby” experience a greater decline in offensive production in comparison to those players who did not partake in the Derby?

Answering this question is of utmost importance to general managers, field managers, and fans alike. If players who partake in the Derby do experience a decline in offensive production between the first and second halves of the season, those in MLB front offices and dugouts can use this information to mitigate potential slumps. Further, if Derby participation leads to a second half slump, fantasy baseball owners can use this knowledge to better manage their teams. Simply put, knowing the effect of Derby participation on offensive production provides us with a deeper understanding of player production.

The next section of this study will address previous literature concerning the “Home Run Derby Curse,” and will discuss how this project builds upon these studies.

Previous Research

Although a good deal of research has been conducted concerning the “Curse,” the veracity of much of this work is difficult to assess. Many of the previous studies on this issue have used subjective analysis of first and second half production of Derby participants in order to assess the effects of the “Curse” (see Carty 2009; Breen 2012; Catania 2013). Although these works have certainly highlighted the need for further research, they are simply not objective enough to definitively address the question of the “Home Run Derby Curse’s” existence.

To date, the most rigorous statistical analysis of the “Curse” is an article by McCollum and Jaiclin (2010), which appeared in Baseball Research Journal. In examining OPS and HR %, McCollum and Jaiclin found a statistically significant relationship between participation in the Derby and a decline in second half production. At the same time, they examined the relationship between first and second half production in years in which players who had previously participated in the Home Run Derby did not participate, and found no statistically significant drop off in production in those years.

At first glance, this appears to be fairly definitive evidence that the “Curse” is real, however, they also found that players’ production in the first half of the season in years in which they participated in the Derby was substantially higher than in those years in which they did not participate in the Derby. This suggests that players who partake in the Derby are chosen because of extraordinary performances. Based on these finding, McCollum and Jaiclin conjectured that the decline in performance after the Derby for those who participated is likely due to the fact that their performance was elevated in the first half of the season, and the second half decline is simply regression to the mean.

Despite the strong statistical basis of McCollum and Jaiclin’s work, there are a number of points in this work that need to be addressed. First, McCollum and Jaiclin only examine those players who have participated in the Derby and have at least 502 plate appearances in a season, thus prohibiting direct comparison with those who did not participate in the Derby. At the heart of the “Home Run Derby Curse,” however, is the idea that participants in the Derby experience a second half slump greater than is to be expected of any player.

The question that derives directly from this conception is, do Derby participants experience a slump greater than is to be expected from players who did not participate in the Derby? To sufficiently answer this question, players who participated in the Derby must be compared to those who did not. Due to a methodology that relies upon data of only Derby participants, Jaiclin and McCollum were unable to sufficiently answer this question.

Second, McCollum and Jaiclin use t-tests to test their hypotheses. This method is a strong objective statistical approach, however, it is not ideal, as it does not allow for the inclusion of control variables. Thus, there may be additional factors affecting both the relationship between Derby participation and second half production simultaneously, creating a spurious finding. This problem can only be addressed through multivariate regression.

The final issue with McCollum and Jaiclin’s work centers on their theoretical expectations and their measures of offensive production. Theoretical extrapolation is absolutely necessary in statistical work as it informs analysis. Without theoretical expectation, researchers are simply guessing at how best to measure their independent and dependent variables. Very little theoretical explanation of the “Curse” has put forth in previous work on the “Curse,” including McCollum and Jaiclin’s piece, and therefore, their measurement of offensive output is not necessarily best.

This article is an attempt to build upon previous work concerning the “Home Run Derby Curse,” and to address the above issues. In the next section, I will develop a short theoretical framework concerning the “Curse.” Four hypotheses are then derived from this theory, which are tested using expanded data, and a different methodological approach. The results suggest that a “Curse” does exist.

Theoretical Basis of the Home Run Derby Curse

Two main theories have been posited to explain the “Home Run Derby Curse.” First, it has been suggested that participation in the Derby saps players of energy that is necessary to continue to perform well in the second half of the season (Marchman 2010). This theory is summarized well by Marchman who focused on the particular experience of Paul Konerko who went deep into the 2002 Home Run Derby in Milwaukee. He wrote:

The strange experience of taking batting practice without a cage under the lights in front of tens of thousands of people left him sore in places he usually isn’t sore, like his obliques and lower back, the core from which a hitter draws his power. Over the second half of the year, he hit just seven home runs, and his slugging average dropped from .571 to .402.

In essence, this theory argues that players who participate in the Derby experience muscle fatigue in those muscles from which power hitting is drawn. Because these fatigued muscles are imperative for power hitting, players who participated in the Derby experience reduced power, and see a drop in power numbers in the second half of the season. Thus, one can hypothesize:

H1.1 Players who participate in the Derby will see a greater decline in their power numbers than players who do not participate in the Derby.

Furthermore, one might expect that a player will experience greater decrease in energy the more swings he takes during the Derby. The logic underpinning this assertion is that using the power hitting muscles for a longer period of time should fatigue them to a greater extent. Thus, a player who takes 10 swings in the Derby should experience less muscle fatigue than a player who takes 50 swings in the Derby. Following this line of reasoning, one should expect that the Derby has a greater effect on players’ second half power hitting performance when they take more swings during the Derby. Since those players who hit more home runs during the Derby take more swings, it can also be hypothesized:

H1.2: Players who hit more home runs in the Derby will see a greater decline in power numbers in the second half of the season than players who hit fewer home runs in the Derby (including those who do not participate).

The second theory of the “Curse” proposes that participation in the Derby leads to players altering their swings (Breen 2011; Catania 2013). It is thought that this altered swing carries over into the second half of the season, affecting players’ offensive output.

Although most studies of the “Curse” rarely delve into how players tweak their swings, it is likely safe to assume that they are changing their approach in the hope of belting as many homers as possible for that one night – developing an even greater power stroke. It is a commonly accepted conjecture that power and strikeouts are positively correlated (see Kendall 2014), meaning that greater power is associated with more strikeouts. This conjecture may not be as true for exceptionally talented players (i.e. Hank Aaron, Ted Williams, Mickey Mantle, Willie Mays, etc.) However, if we accept this assumption to be correct for the majority of players, it can be stated that if players change their swing to hit more home runs, they should see a corresponding increase in strikeouts in the second half of the season.[i] Thus, it can be hypothesized:

H2.1: Players who participate in the Home Run Derby will experience greater strikeouts per plate appearance in the second half of the season than those who did not participate in the Derby.

As with hypotheses 1.1 and 1.2, it can also be assumed that the effect of participation in the Derby will be greater the more swings an individual takes during the Derby. That is to say, if a player hits more home runs during the Derby, the altered swing he uses during the Derby will be more likely to carry through to the second half of the season. This leads to the hypothesis:

H2.2: Derby participants who hit more home runs in the Derby will experience greater strikeouts per plate appearance in the second half of the season than those who hit fewer home runs in the Derby (including those who do not participate).

In the next section of this study I will discuss the analytical approach, variable operationalization, and the data sources used to address the above four hypothesis.

Data and Analytical Method

Below I will begin with a discussion of the data used in this study. I will then discuss the independent and dependent variables for each hypothesis as well as the control variables used in this study. Finally, I will discuss the methodological approach used in this study.

Data Sources and Structure

Above, it is hypothesized that those players who either participated in the Derby, or performed well in the Derby will see greater offensive decline between season halves than those who either did not participate in the Derby, or struggled in the Derby. In order to properly test these hypotheses one must use data that includes those who participated in the Home Run Derby, and those who did not participate in the Derby.

This paper performs a meta-analysis of all players with at least 100 plate appearances in both the first and second halves of the season from 1985 (the first year in which the Home Run Derby was held) through 2013. This makes the unit of analysis of this paper the player-year. This data excludes observations from 1988 as the Derby was cancelled due to rain. Further, 1994 is also excluded as the second half of the season was cut short due to the players’ strike.

Independent Variables

The main independent variable for hypotheses 1.1, and 2.1 is a dichotomous measure of participation in the Home Run Derby. A player was coded as a 1 if they participated in the Derby and a 0 if they did not participate in the Derby. Between 1985 and 2013 a total of 229 player-years were coded as participating in the Derby.

The independent variable for hypotheses 1.2, and 2.2 is a measure of each player’s success in the Home Run Derby. This is an additive variable denoting the number of home runs each player hit in each year in the Derby. This variable ranges from 0 to 41 (Bobby Abreu in 2005). Those who did not participate in the Derby were coded as 0.[ii]

Dependent Variables

Hypotheses 1.1 and 1.2 posit that participation in the Derby and greater success in the Derby will lead to decreased power numbers respectively. Power hitting can be measured in numerous ways, the most obvious being home runs per plate appearance (HRPA). However, theoretically, if players are being sapped of energy, this should affect all power numbers, not simply HRPA. Restricting one’s understanding of power to HRPA ignores other forms of power hitting, such as doubles and triples. So as not to remove data variance unnecessarily, one can measure change in power hitting by using the difference between first and second half extra base hits per plate appearance (XBPA) rather than HRPA.

Thus, the dependent variable for hypotheses 1.1 and 1.2 is understood as the difference between XBPA in the first and second halves of the season for each player-year. XBPA is calculated by dividing the number of extra base hits (doubles, triples, and home runs) a player has hit by the number of plate appearances, thus providing a standardized measure of extra base hits for each player-year.

The dependent variable for hypotheses 1.1 and 1.2 was created by calculating the XBPA for the first half of the season, and the second half of the season for each player-year. The XBPA for the second half of the season for each player-year was then subtracted from the XBPA for the first half of the season for each player-year. Theoretically, this variable can from -1.000 to 1.000. In reality this variable ranges from -.116308 to .1098476, with a mean of -.0012872, and a standard deviation of .025814.

The dependent variable for hypotheses 2.1 and 2.2 is the difference between first and second half strikeouts per plate appearance (SOPA) for each player-year. SOPA is calculated by dividing the number of strikeouts a player has by his plate appearances, thus providing a standardized measure of strikeouts for each player-year.

The dependent variable for these hypotheses was created by calculating the SOPA for the first half of the season, as well as the second half of the season for each player-year. The SOPA for the second half of the season for each player-year was then subtracted from the SOPA for the first half of the season for each player-year. Theoretically, this variable can from -1.000 to 1.000. In reality this variable ranges from -.1857143 to .1580312, with a mean of -.003198, and a standard deviation of .0378807.

Control Variables

A number of control variables are included in this study. A dummy variable denoting whether a player was traded during the season is included.[iii] To control for the possible effects of injury in the second half of the season, a dummy variable denoting if a player had a low number of plate appearances in the second half of the season is included.[iv] Further, I include a dummy variable measuring whether a player had a particularly high number of first half plate appearances.[v] Finally, controls denoting observations in which the player played the entire season in the National League,[vi] observations that fall during the “Steroid Era,”[vii] observations that fall in a period in which “greenies” were tolerated,[viii] and observations that fall during the era of interleague play are included.[ix]

Analytical Approach

The main dependent variables used to test the above hypotheses are the difference between first and second half XBPA, and the difference between first and second half SOPA. For each of these variables, the data fits, almost perfectly, a normal curve.[x] For each of these variables, the theoretical range runs from 1 to -1, with an infinite number of possible values between. Although these variables cannot range from infinity to negative infinity, the most appropriate methodological approach for this study is OLS regression.

In the next section of this piece, I will report the findings of the tests of hypotheses 1.1 through 2.2. I will then discuss the implications of these findings.

Analysis

This section will begin with the presentation and the discussion of the findings concerning hypotheses 1.1 and 1.2. I will then present and discuss the findings of tests of hypothesis 2.1 and 2.2.

Analysis of Hypotheses 1.1 and 1.2

Column 1 of table 1 shows the results of the test of hypothesis 1.1. The intercept for this test is   -.0007, but is statistically insignificant. This suggests that, with all variables included in the test held at 0, players will see no change in their XBPA between the first and second halves of the season. The coefficient for the “Derby participation” variable shows a statistically significant coefficient of .008. This means, if a player participates in the Derby, he can expect to see his second half XBPA drop by .008.

Of course, there is a possibility that those who participate in the Derby will see a greater drop in their XBPA than the average player because, in order to be chosen for the Derby, a player will have a higher XBPA.[xi] This would then make it more likely that players who participate in the Derby see a greater drop in XBPA than players who do not participate as they regress to the mean. To account for this, the sample can be restricted to players with a high first half XBPA.

The mean first half XBPA for all players (those who do and do not participate in the Derby) between 1985 and 2013 is .0766589. The sample is restricted to only those players above this mean. This is done in for the tests displayed in column 2 of table 1. As can be seen, the intercept is statistically significant, with a coefficient of .01. Those who have average or above average XBPA in the first half of the season can expect to see their XBPA drop by .01 after the All-Star Break when all other variables in the model are held equal.

Table 1: The Effect of Home Run Derby Participation on the Difference in XBPA.

  Full Sample XBPA > .0766589 XBPA > .1138781
Derby Participation .008***(.002) .002(.002) -.004(.003)
Trade .001(.001) .002(.002) .007(.006)
Diminished PAs -.005***(.001) -.002**(.001) .002(.001)
High 1st Half ABs -.001(.001) -.006***(.001) -.008**(.003)
National League .0003(.001) -.001(.001) .0002(.002)
Steroids -.001(.002) -.001(.002) .004(.005)
Greenies .004**(.002) .003(.002) .003(.002)
Interleague .002(.002) .00003(.002) -.01(.005)
Intercept -.0007(.002) .01***(.002) .04***(.005)
N 7,330 3,904 636

Note: Values above represent unstandardized coefficients, with standard errors in parentheses. *p<.05, **p<.01, ***p<.001

Turning to the Derby participation variable, one notices that it is now statistically insignificant with a coefficient of .02. When restricting the sample to only those who showed average or above average power in the first half of the season, the results show that those who participate in the Derby will see no statistically discernible difference in their power hitting when compared to those who did not participate in the Derby.

The variables denoting if a player had a low number of plate appearances in the second half of the season, or a high number of at-bats in the first half of the season, are statistically significant and both present with negative coefficients. Meaning if a player has a high number of at-bats in the first half of the season or if he has a low number of plate appearances in the second half of the season, he will actually see an increase in XBPA.

Although the results in column 2 of table 1 are telling, it may be useful to restrict the sample even further. Those who are selected for the Derby are, for all intents and purposes, the best power hitters in baseball. Therefore, one can restrict the sample to only the best power hitters and compare only those players with a first half XBPA equal to, or above the average for those who participated in the Derby, while, of course, keeping all Derby participants in the sample.

The mean first half XBPA for Derby participants between 1985 and 2013 is .1138781. Tests restricting the sample to only those with a first half XBPA of .1138781 are displayed in column 3 of table 1. The intercept for these tests is statistically significant and shows a coefficient of .04. Meaning those with a first half XBPA of .1138781 can expect to see a drop of .04 in their XBPA after the All-Star Break. The coefficient for the variable measuring participation in the Derby is -.004, but is statistically insignificant. This suggests that those who participate in the Derby do not see a marked decrease in their power hitting after the Derby when compared to those of similar power hitting prowess.

The only variable that shows a statistically significant effect in these tests is that which denotes whether a player had a high number of first half at-bats. As with previous tests, the coefficient for this variable is negative. This suggests that players who have a high number of first half at-bats see an increase in the XBPA between the first and second halves of the season in comparison to those without a high number of first half at-bats.

Columns 1 through 4 in table 2 show the tests of success in the Derby (the number of home runs hit) on the difference between first and second half XBPA. The test with a full sample is displayed in column 1 of table 2. The intercept for this test is statistically insignificant, suggesting that on average, players do not experience a marked change in their XBPA between the first and second halves of the season.

The variable denoting the number of home runs a player hit during the Derby is statistically significant and has a coefficient of .0003. This means that for every home run a participant in the Derby hit he can expect his XBPA after the All-Star Break to decline by .0003 points.

Of course, the relationship between Derby success and the difference in first half and second half XBPA is not likely to be linear, but rather curvilinear. Thus, a measure of home runs hit during the Derby squared should be included. The test including this variable is displayed in column 2 of table 2. The intercept is again statistically insignificant suggesting that when all variables in the model are held at 0, players should not see a marked change in their XBPA between the first and second half of the season.

Table 2: The Effect of the # of Home Runs Hit in the Derby on the Difference in XBPA.

  Full Sample Restricted Samples
Without HR^2 With HR^2 XBPA>.0766589[xii] XBPA>.1138781[xiii]
Home Run Total .0003*(.0002) .001**(.0004) .00002(.0002) -.00002(.0002)
Home Runs Squared . -.00004*(.00002) . .
Trade -.001(.001) -.001(.001) .002(.002) .007(.002)
Diminished PAs -.005***(.001) -.005***(.001) -.002**(.002) .002(.002)
1st Half ABs .001(.001) .001(.001) -.006***(.001) -.008***(.002)
National League .0003(.001) .0003(.001) -.001(.001) .0002(.002)
Steroids -.001(.002) -.001(.002) -.001(.002) .004(.005)
Greenies .004**(.002) .004**(.002) .003(.002) -.003(.005)
Interleague .002(.002) .002(.002) -.00003(.002) -.009(.005)
Intercept -.001(.002) -.001(.002) .01(.002) .04***(.006)
N 7,330 7,330 3,898 636

Note: Values above represent unstandardized coefficients, with standard errors in parentheses. *p<.05, **p<.01, ***p<.001

 

The effect of success in the Derby remains statistically significant with a coefficient of .001. This means that with each home run a player hits in the Derby his XBPA in the second half of the season will decline by .001. Further, the variable “home runs squared” is statistically significant, and has a coefficient of -.00004. This indicates that the effect of the number of home runs a player hits in the Derby on second half production decreases with more home runs. In essence, hitting 40 home runs during the Derby does not have the same effect on second half offensive production as hitting 30 home runs during the Derby, and so on.

In terms of control variables, the variable denoting a high number of first half at-bats is statistically significant with a negative coefficient in the tests reported in column 1 of table 2. Further, in the tests reported in column 2 the variable denoting a diminished number of plate appearances in the second half of the season is statistically significant and negative.

As with the tests reported in table 1, restricting the sample to only those players with average and above average first half XBPA may be useful. Column 3 of table 2 shows the results of the test of the effect of success in the Derby on the difference in XBPA between the two halves of the season when the sample is restricted to those with a first half XBPA at or above the leagues’ average (.0766589).

The intercept in this test is statistically significant with a coefficient of .012. This means that, when all variables included in this test are held at 0, players with an average or above average first half XBPA notice a decline in the second half XBPA. Importantly, the effect of the number of home runs hit during the Derby is statistically insignificant, meaning that hitting more home runs during the Derby has no statistical effect on the difference between first half and second half XBPA when the sample is restricted to those with average or above average first half XBPA.

Both the variable denoting whether a player had diminished second half plate appearances, and the variable denoting whether a player had a high number of first half at-bats, are statistically significant with negative coefficients. This implies that those who experience a diminished number of second half plate appearances, and those players with a high number of first half at-bats see an increase in their XBPA between the first and second halves of the season.

Column 4 of table 2 restricts the sample based on the mean first half XBPA of those who participated in the Derby. The mean first half XBPA of these players is .1138781. The intercept for this test is statistically significant with a coefficient of .036. The variable measuring success in the Derby is statistically insignificant, meaning that the number of home runs a player hits in the Derby has no statistical effect on the difference between first and second half XBPA when comparing Derby participants to similar power hitters who did not participate in the Derby.

In terms of controls, the variable denoting whether a player had a high number of first half at-bats is again statistically significant with a negative coefficient. This, as with previous tests, suggests that those who have a high number of first half at-bats will experience an increase in XBPA between the first and second halves of the season.

Analysis of Hypotheses 2.1 and 2.2

The results of the tests of hypothesis 2.1 (participation in the Derby will lead to more strikeouts per plate appearance) are displayed in column 1 of table 3. This column shows the relationship between participation in the Derby and the change in SOPA between the first and second halves of the season. As can be seen, the intercept is -.006 and is statistically significant, meaning, all other things equal, players strikeout more often in the second half of the season.

The coefficient for “Derby participation” is -.005 and is statistically significant, meaning that those who participate in the Home Run Derby will see their second half SOPA increase by .005 between halves of the season in comparison to players who do not participate in the Derby. When one takes into account that SOPA should increase by .006 when all other variables are held at 0, this finding suggests that Derby participants should see an increase of .011 in their SOPA between the first and second halves of the season.

Unlike XBPA, there is very little chance that SOPA is associated with selection for the Home Run Derby. Moreover, the average first half SOPA for the entire sample used in this study is .1610312, whereas the mean first half SOPA for those who participated in the Home Run Derby is .1669383. Those who participated in the Derby were actually more likely to strikeout in any given plate appearance than those who did not participate in the Derby. Essentially, assuming that one should see a regression to the mean, it is more likely that those who participate in the Derby would see a decrease in SOPA between the first and second halves of the season. These results, however, tell the opposite story, and cannot be explained by a mere statistical anomaly. Therefore, it is unnecessary to restrict the sample, and one can state that hypothesis 2.1 is supported.

Turning to column 2 of table 3, one sees a test of hypothesis 2.2 (the more home runs a player hits in the Derby the smaller the difference between his first and second half SOPA will be). Much like the results in column 1 of table 3, the intercept is -.006 and is statistically significant. Thus, when holding all other variables at 0, one can expect the difference between a player’s first and second half SOPA to increase by .006.

The coefficient for the variable denoting the total number of home runs a player hit during the Derby shows a statistically significant coefficient of -.0005. For every home run a player hit during the Derby, the difference between his first half SOPA and second half SOPA will decrease by .0005.

This relationship, however, is likely curvilinear. In order to account for this likelihood I include a variable in which the total number of home runs a player hit during the Derby is squared. Column 3 of table 3 reports the results of a test including a measure of the total number of home runs squared. When this variable is included the coefficient of the total number of home runs becomes statistically insignificant. This suggests that the total number of home runs a player hit during the Derby is not related to the difference in that player’s first half SOPA and second half SOPA. It must be noted, however, that this finding does not negate the result in column 1 of table 3.

Interestingly, there are a number of control variables that show statistical significance in all tests included in table 3. If a player was traded midseason, he can expect to see the difference between his first half SOPA and second half SOPA to shrink, meaning he will see an increase in SOPA in the second half of the season. Further, having a below average number of plate appearances in the second half of the season leads to a decrease in second half SOPA.
class=Section9>

Table 3: The Effect of Home Run Derby Participation and the # of Home Runs Hit in the Derby on the Difference in SOPA.

  Participation in Derby (1=Participation) Number of Home Runs Hit in Derby
. Without HR^2 With HR^2
Derby Participation/Home Run Total -.005*(.003) -.0005*(.0002) -.00001(.0002)
Home Run Total Squared . . -.00002(.00002)
Trade -.004*(.002) -.004*(.002) -.005*(.002)
Diminished PAs .005***(.001) .005***(.001) .005***(.001)
High 1st Half ABs .002*(.001) .002*(.001) .002(.001)
National League .002(.001) .002(.001) .002(.001)
Steroids .002(.002) .002(.002) .002(.002)
Greenies .0003(.002) .0003(.002) .0004(.002)
Interleague -.002(.002) -.001(.002) -.001(.002)
Intercept -.006*(.003) -.006*(.003) -.007*(.003)
N 7,330 7,330 7,330

Note: Values above represent unstandardized coefficients, with standard errors in parentheses. *p<.05, **p<.01, ***p<.001

The next section of this paper will first place the main findings of this paper in a broader context of the “Home Run Derby Curse.” It will then discuss possible avenues for further research.

Implications

The results above were mixed. In some instances, participation in the Derby, or success in the Derby, was statistically related to second half offensive decline, whereas in other tests, there was no relation between participation in the Home Run Derby and changes in offensive production between season halves. When using the full sample (N=7,330), the results showed that Derby participants can expect to see a greater drop in their XBPA between halves of the season than those who did not participate in the Derby. Moreover, those who have greater success in the Derby will see a greater drop in their XBPA between the first and second halves of the season in comparison to those who have not had as much success in the Derby. Further, the results showed, when using the full sample of players, those who participated in the Derby, as well as those who had greater success in the Derby, will, on average, expect to see their second half SOPA increase more than Derby non-participants.

These findings, however, must be discussed in closer detail. As McCollum and Jaiclin (2010) pointed out in their piece, some of these results may be due to the often extraordinary performances of Derby participants in the first half of the season, and any decline is simply a regression to the mean.

In order to address this issue, in testing the effect of Derby participation and success on change in XBPA, I restricted the sample to those who showed above-average and extraordinary performances in the first half of the season. The effect of Derby participation and success on change in XBPA disappeared when the sample was restricted to those who showed average or above-average first halves. This suggests that hypotheses 1.1 and 1.2 are not confirmed, and lends support to McCollum and Jaiclin’s regression to the mean conjecture.

Turning towards the relationship between Derby success and change in SOPA between halves, an effect was initially found. This suggests that those who hit more home runs during the Derby tend to see an increase in their second half SOPA in comparison to their first half SOPA. This relationship, however, evaporates when a measure of home runs squared in included. This suggests a lack of robustness to this finding, and thus hypothesis 2.2 cannot be confirmed.

Based upon these findings, it appears that the “Home Run Derby Curse” is more of a Home Run Derby myth. The results concerning Derby participation and SOPA, however, appear to tell a different story. The test of hypothesis 2.1 shows that those who participate in the Derby see a larger increase in their SOPA between halves of the season compared to players who do not participate in the Derby. As stated above, it is unnecessary to restrict the sample based upon first half SOPA because those who participate in the Derby have, on average, a higher first half SOPA than the full sample mean. Thus, the argument that Derby participants have had an exceptionally strong first half does not apply in the case of SOPA.

Simply put, derby participants do see a statistical increase in their SOPA in comparison to non-participants, suggesting that there is some credence to the “Home Run Derby Curse,” and, it is caused by players changing their swings. The question that remains, however, is what is the substantive impact of participation in the Derby on SOPA?

The Substantive Effect of Derby Participation on SOPA

Essentially, Derby participants can expect to see their second half SOPA increase by .005 more points over their first half SOPA than those players who do not participate in the Derby. The average first half SOPA for those who did not participate in the Derby is .1608855. The mean number of first half plate appearances for the sample used in this study, excluding those who participated in the Derby, is 249.5692. This means that the average Derby non-participant will strikeout about 41 times in the first half of the season.

With all variables in the model held equal, the average second half SOPA will be .006 points higher than the average first half SOPA, about .1668855. The mean number of second half plate appearances for the sample used in this study, excluding those who participated in the Home Run Derby, is 219.5873. Therefore, an average Derby non-participant will strikeout about 37 times in the second half of the season. When the first half and second half are combined, an average player who did not participate in the Home Run Derby can expect to strikeout about 78 times.

The mean first half SOPA for a Derby participant is .1669383. The average number of first half plate appearances for Derby participants is 356.9345. Thus, the average Derby participant can expect to strikeout about 60 times in the first half of the season.

The mean second half SOPA for a Derby participant can be understood as:

2nd Half SOPA = .1669383+(-α)+(-β)

Where α is the intercept (-.006) and β (-.005) is the coefficient for participation in the Derby. All other variables held constant at 0, Derby participants can expect a second half SOPA of .1779383. The average number of second half plate appearances for Derby participants is 291.7209. With all variables other than “Derby participation” held equal at 0, those who participate in the Derby can expect about 52 strikeouts in the second half of the season. This suggests that a Derby participant can expect to strike out 112 times during the season.

This is a substantial difference in strikeouts, however, in order to accurately assess the true substantive effect of Derby participation, one must utilize a common number of plate appearances across both Derby participants and non-participants alike. For the purposes of this paper, I make the reasonable assumption that players will have about 300 plate appearances in the second half of the season.

Using 300 plate appearances, those who did not participate in the Derby can expect 50 strikeouts in the second half of the season, whereas those who did participate in the Derby can expect 53 strikeouts in the second half of the season.[xiv] This difference of three strikeouts does not seem substantively large.

Further, it must be noted that the coefficient for Derby participation of -.005 is only an estimate with a 95% confidence interval ranging from -.0002 to -.01. If the true coefficient is -.01, this would amount to about 5 more strikeouts over 300 plate appearances in the second half of the season. If the true coefficient is -.0002, a player who participates in the Derby could expect, all other things held equal, to strikeout 2 more times over 300 plate appearances in the second half of the season than a player who did not participate. In essence, the difference in SOPA between the halves of the season due to participation in the Derby is statistically significant, but substantively negligible.

Broader Implications and Future Research

Although the effects of Derby participation on SOPA are substantively minimal, the take away point of this study is that a “Home Run Derby Curse” does exist. Further, the confirmation of H2.1 suggests that Derby participants are altering their swings to develop more power during the Derby, and this is affecting their swing in the second half of the season.

Regardless of the substantive effects, this is an important finding. If a Derby participant’s swing is altered so greatly that they begin striking out at an even faster rate than non-participants in the second half of the season, the question that we must ask is, what other effects does this altered swing have? Does it increase a Derby participant’s flyball ratio? Are Derby participants more likely to see a drop in batting average and walks?

Beyond these questions, future research into the “Curse” should also focus on how the Derby alters a player’s swing. One possible avenue for future research lies in measuring changes in a hitter’s stance (i.e. distance between their feet, angle of their back elbow, etc.) after the Derby relative to a player’s stance prior to the Derby.

Works Cited:

 J.P. Breen, “The Home Run Derby Curse,” FanGraphs, July 11, 2012, accessible via http://www.fangraphs.com/blogs/the-home-run-derby-curse/.

Jason Catania, “Is there Really a 2nd-Half MLB Home Run Derby Curse?,” Bleacher Report, July 15, 2013, accessible via http://bleacherreport.com/articles/1702620-is-there-really-a-second-half-mlb-home-run-derby-curse.

Derek Carty, “Do Hitters Decline After the Home Run Derby?,” Hardball Times, July 13, 2009, accessible via http://www.hardballtimes.com/do-hitters-decline-after-the-home-run-derby/.

Evan Kendall, “Does more power really mean more strikeouts?,” Beyond the Box Score, January 12, 2014, accessible via http://www.beyondtheboxscore.com/2014/1/12/5299086/home-run-strikeout-rate-correlation.

Tim Marchman, “Exploring the impact of the Home Run Derby on its participants,” Sports Illustrated, July 12, 2010, accessible via http://sportsillustrated.cnn.com/2010/writers/tim_marchman/07/12/hr.derby/.

Joseph McCollum, and Marcus Jaiclin. 2010. “Home Run Derby Curse: Fact or Fiction?,” Baseball Research Journal 39(2).

[i] It could be argued that those players who participate in the Derby are also exceptional players, and therefore, this conjecture will not be correct for the majority of those who participate in the Derby. At first glance, this would appear to create a problem for the analysis in this piece, however, this is not so. This would only present a problem if being exceptional led to a greater positive correlation between a power swing and strikeouts, that is if a power stroke for exceptional players leads to more strikeouts than a power stroke for average players. If exceptional hitters are less likely to have a positive correlation between power and strikeouts, and Derby participants are exceptional players, we would expect to see a lower strikeout rate among these players when they begin attempting to hit for greater power. Essentially, a violation of this assumption leads to a more conservative measurement.

[ii] Some players who participated in the Derby were coded as “0,” as they did not hit any home runs.

[iii] A player is coded as a 1 if he was traded during the season and a 0 if he was not traded.

[iv] A player is coded as a 1 if the difference in his plate appearances between the first and second halves of the season (Pre All-Star Break PAs – Post All-Star Break PAs) is greater than the observed average in the data (39.8). A player is coded as a 0 if the difference in his plate appearances between the first and second halves of the season is less than the observed average in the data.

[v] A player is coded as a 1 if the number of plate appearances he had during the first half of the season is greater than the observed average in the data (342).

[vi] This variable is a dummy variable, with a player being coded as a 1 if he spent the entire season in the National League, and a 0 if he did not spend the entire season in the National League.

[vii] Although the “Steroid Era” is somewhat difficult to nail down, for the purposes of this paper, it is assumed to run from 1990 through 2005. Therefore, if an observation is in or between 1990 and 2005 it is coded as a 1. If an observation falls outside of this time period it is coded as a 0.

[viii] For the purposes of this paper, the era of “greenies” is deemed to run from 1985 through 2005. Therefore, if an observation is in or between 1985 and 2005 it is coded as a 1. If an observation falls outside of this time period it is coded as a 0.

[ix] Interleague play began in 1995 and continues through present. Therefore, if an observation is in or between 1995 and 2013 it is coded as a 1. If an observation falls outside of this time period it is coded as a 0.

[x] The difference in XBPA variable maintains the same basic distribution when the sample is restricted to those with an XBPA equal to or greater than the league average (.0766589), as well as equal to or greater than the Derby participant average (.1138781).

[xi] The mean first half XBPA for those who participate in the Derby is .114, whereas the mean first half XBPA for those who do not participate in the Derby is .077.

[xii] When this model is run including a variable for “home runs squared” the results remain similar.

[xiii] When this model is run including a variable for “home runs squared” the results remain similar.

[xiv] One could quibble with the estimate of 300 second half plate appearances, however, it is important to note that a Derby participant’s second half strikeout total increases over a non-participants strikeout total by .5 for every 100 plate appearances. Thus, if one were to use 200 plate appearances, the difference in average strikeout totals between Derby participants and non-participants for the second half of the season would be about 2.5. Additionally, if one were to use 400 plate appearances, the difference in average strikeout totals between Derby participants and non-participants for the second half of the season would be about 3.5.


The Grandyman (Still) Can

For every Dontrelle Willis–who continues to get looks from Major League teams despite over eight years of complete ineptitude–there exists a handful of other players who fade into relative obscurity only a year or two removed from a dominant season. All it generally takes is a down year resulting from–or paired with–an injury to send a guy spiraling below the radar. These are often the players that can return the most value during fantasy drafts if you can make the distinction between a year that’s an aberration, and one that is a bellwether for a significant, irreversible decline in skills.

While I can’t say with complete confidence that Curtis Granderson’s 2014 doesn’t fall into the latter category, there were a couple of encouraging things going on below the subpar surface stats that make me think he can return some solid value this year, especially considering where he’s going in most drafts.

Granderson was 33 last year and coming off an injury-shortened season. He was also trading a left-handed pull hitter’s haven in Yankee Stadium for the cavernous confines of Citi Field. All things considered, it was natural to expect some significant regression. And when he hit .136 through his first 100 at-bats of the season, it seemed like the Mets might have had a disaster of Jason Bay-like proportions on their hands.

Fortunately for them, Granderson managed to right the ship to an extent, putting together a couple of excellent months. His final line of .227/.326/.388–dragged further down by a nightmarish .037 ISO, 16-for-109 August–wasn’t spectacular by any stretch. But there were some nice takeaways buried in there.

For one, his bat speed doesn’t seem to have slowed enough to justify the statistical hits he took across the board. Despite seeing 56.3% fastballs–the most he’s seen since 2010 by a wide margin–his Z-Contact % of 85% was in line with his 85.8% career average, and not far removed from the league average of 87%. I suspect the uptick in fastballs resulted from opposing teams banking on an age-slowed swing, but Granderson’s contact rates on high velocity pitches in the zone didn’t suffer for it.

Granderson also set a career high in O-Contact % with a 62.7% rate. This could usually indicate a lack of plate discipline as much as it could a sustained bat speed, except that Granderson’s O-Swing % of 26.2% is roughly the average of what he did in the four years prior. He also managed to post the second-highest walk rate of his career (12.1%) and his lowest strikeout percentage since 2009 (21.6%). These are not particularly impressive rates in their own right, but in the context of Granderson’s career they do help to dispel the notion that last year was the beginning of the end for his hitting ability.

That is not to say, of course, that I foresee a return to the 40 home run, .260+ ISO form that he flashed in his early Yankee years–there’s no way he ever touches the absurd 22 HR/FB% that sustained that run. But with the right field fences at Citi Field moving in–a change that apparently would have resulted in 9 more home runs for Granderson had it been done last season–and some improvement on last year’s uncharacteristically bad .265 BABIP, I would not be at all surprised to see a home run total between 25 and 30 to go along with double-digit steals and a batting average that won’t kill you. And that has value when it is being drafted as low as Granderson currently is.


Rickie Weeks’ Value in Disguise

Rickie Weeks going to the Mariners moved a lot of eyebrows, raising some, furrowing others. Weeks’s deal will be worth $2 million for one year, according to Jim Bowden. To the casual fan, this move might seem a little unnecessary: Seattle already has a pretty good second baseman in Robinson Cano. If you take a closer look, however, there are some hidden metrics that would point to Weeks having a resurgence.

Let’s first look at this acquisition from the position of the casual fan. Weeks is coming off a 2014 season wherein he only had 286 plate appearances, and saw a substantial reduction in power. Long story short, Weeks was a singles hitter last year. In those 286 trips to the plate, Weeks had 41 singles. In 2013, Weeks had 42 singles in 113 more plate appearances. While this helped his overall batting average get back on track, from .209 in 2013 to .274 in 2014, it did nothing to increase his power numbers.

Weeks is also a below-average fielder. Scratch that, Weeks is the worst fielder at the second base position in all of baseball, and he has been for some time now. If we are going by FanGraphs’s UZR, Weeks has a career total UZR of -56.5 for his career. That puts Weeks right at the bottom as far as second baseman who have played more than 5,000 innings since 2005 (Week’s first full season). Below are the bottom five second baseman according to UZR in that same time frame. Recognize anyone?

Notice that current Seattle second baseman Robinson Cano is four from the bottom. This really doesn’t tell us anymore than that Seattle does not put a premium on defense, and we might have suspected this all along if we had first taken a look at team UZR from the last two seasons.

There we have it. A match made in heaven. It is no coincidence that two of the bottom five defensive teams over the last two years contained two of the bottom five defensive second basemen, in Cano and Weeks. So what does this all have to do with Seattle and their recent free-agent acquisition of Mr. Weeks?

Ceteris paribus. All other things being equal, meaning if we take defense out of the evaluation (because Seattle is not focusing on defense at the time), we can better understand what Seattle saw in this now 32-year-old utility man.

Our answers lie within the batted ball statistics. Over his career, Weeks has had a fly ball percentage of 35-36% consistently. Even in 2013 it was 32.7%. Last season that percentage sunk dramatically to a career low of 25%. This may or may not be a bad sign. We will come back to the fly ball percentage shortly. Now let us look at the HR/FB ratio statistic.

Last season Weeks saw a spike in his HR/FB ratio. It reached an all time high of 17.8%.  His career average for that metric is 14%. Knowing that his fly-ball percentage was at an all time low, with his HR/FB ratio at an all time high we can reasonably expect those two metrics to meet somewhere in the middle this upcoming season.

There is one last measurement we should look at in order to fully understand Weeks’s value possibility. Jeff Zimmerman and Bill Petti, of FanGraphs fame, run their own website, baseballheatmaps.com, where one can look at batted-ball distance for any player going back to 2007. When we look at Rickie Weeks, we see that he has a career average fly-ball distance of 292 feet. Last year, his average fly-ball distance was 285 feet. This slight decline is understandable due to the age factor. Weight this how you wish, but it doesn’t seem like Weeks is going through any more of a power decline than other professionals have gone through at his age.


Putting it all together, if Weeks starts to hit more fly balls, and (if nothing else) maintains his career average HR/FB ratio, the Mariners will reap the full value of his services. His defense is subpar at best, but Seattle does not seem too concerned about that. Right-handed power seems to be scarce at the moment, especially at the second base position. Rickie will add depth to Seattle, but the real value might come during the season when teams start looking for power to boost their playoff lineups—that is, if Weeks can deliver.