Archive for February, 2015

Does Seeing More Pitches Lead to More Runs?

There are many notions or perceived notions in baseball that are commonly false. For example, pundits throughout time have often suggested that a good hitter provides protection for another good hitter. Studies have been done on this and it is false. Another commonly stated notion, is that seeing a lot of pitches is a good thing. This notion is not only stated by former players, making constant sets of statements based on no evidence or facts, or by TV broadcasters who use a never-ending array of cliché lines, but also by smart sabermetricians.

But is this notion true? Does seeing more pitches really lead to more runs? First and foremost, I want to thank Owen Watson, who on September 30th 2014, came out with an article for The Hardball Times displaying that there is a correlation between seeing pitches and drawing walks (you can find his article here). This is basically where I got the idea for this study. The study was well done, however, I don’t think it was asking the right question. While yes, there is a correlation between seeing pitches and walks, and walks are good, this doesn’t necessarily mean that seeing more pitches leads to more runs or that seeing more pitches is necessarily a good thing. There are other factors that one must consider in order to be able to come to this conclusion (Watson’s article was on pitching efficiency, and I want to make it clear that I’m only focusing on this specific aspect of the article).

For example, the Red Sox in 2014 saw a lot of pitches yet they weren’t one of the top teams when it came to run scoring. Also, the Royals went all the way to the finals last year, and they don’t exactly see a lot of pitches. In fact they’re famous for having a bunch of free swingers on the team. Finally, while getting into deep counts leads to more walks, it’s also very possible that it will lead to more strikeouts. This is what made me question whether seeing more pitches is a good thing. While Watson’s study looked at the correlation between walks and pitches per plate appearance,  it ignored several other factors that could contribute to seeing a lot of pitches being counterproductive.

Ok, now let’s get to the fun stuff. The way I constructed this study was rather simple and I basically used the same model Watson did for his study, I just changed the BB% to R/G (runs per game). Below is a chart that examines the correlation between Pit/PA (pitches per plate appearance) and R/G (runs per game) for every team, for the 2014 season. The X-axis represents the teams. Then you will notice two data points on the Y-axis — the blue represents R/G, and the red represents Pit/PA. Oh and if you don’t know what LgA is on the X-axis, that’s the league average.

123

So there it is. As you might be able to tell there is no real correlation between pitches seen and runs scored. The correlation coefficient, by the way, is R = -0.0486. If you are unfamiliar with correlation coefficients, all you really need to understand is a correlation coefficient of 0 displays no real correlation between the data. The correlation here is slightly negative but it’s too small or too close to zero to really be interpreted as a negative correlation.

You might, at this point, find this data hard to believe. Well, I would ask you to consider this; strikeouts as I’ve already mentioned, and can’t keep mentioning enough, are at an all-time high. Going deeper into counts therefore puts one at a higher risk of getting struck out. This may be one of the explanations for the data above. Also, seeing more pitches means you are wearing the starting pitcher out, meaning you are far more likely to face the bullpen. This is not necessarily a good thing! Bullpen pitchers are better than ever. Facing the bullpen, in today’s game, may actually be counterproductive.

Now let’s consider one final element. This study is not perfect and has a few flaws. Most notably, it only takes into account 2014. This after all may have just been a blip on the radar. I will therefore be looking at more of this data to truly examine whether this data is 100% accurate. I will also take a look at the correlation between pitches seen and K% to get a better and further understanding of whether it is beneficial to see a lot of pitches. I just thought that this data point was simply too interesting not to be shared especially as we head into a new season of baseball. Hopefully this will allow people to be more critical when they are watching the game and listening to pundits speak on TV. Remember, just because someone says something doesn’t mean it is true.

Thanks to Owen Watson for doing his study in The Hardball Times; he now writes for FanGraphs. The data was also all found at Baseball Reference.


The Most Signature Pitch of 2014

If you were feeling charitable, you could say this post owes a lot to Jeff Sullivan’s recent set of articles examining pitch comps. If you weren’t feeling charitable, you could say this post is a shameless appropriation of his ideas. Either way, you should read those articles! They were very good, and very entertaining, and directly inspired this post. There were seven, in total: here, here, here, here, here, here, and here. I’ll wait.

Back? Good! In the comments of the third article, someone asked Jeff about finding the “most signature” pitch, or the pitch with the worst/fewest comps. Jeff said: “Wouldn’t be surprised if it was Dickey or the Chapman fastball. That math… I’m afraid of that math, but I might make an attempt.” Jeff has looked at unique pitches twice (Carlos Carrasco’s changeup and Odrisamer Despaigne’s changeup, the last two articles linked above), but I wanted to attack the question in a less ad-hoc fashion, looking at all pitches rather than singling some out.

Jeff wasn’t wrong, though – the math is not simple. His methodology doesn’t really work here for a couple reasons. First of all, I’m looking for uniqueness rather than similarity. I could just flip Jeff’s method around and look for high comp scores, like what he did for the Carrasco/Despaigne changeups, but I also want to consider all pitch types. Again, Jeff sort of did this in the Despaigne article, by comparing his changeup to a few different pitch types, but that is not really feasible for every pitch thrown.

What this means is that a new method is needed to directly calculate dissimilarity. We could find the maximum distances from the mean (basically Jeff’s method), which would work for a single pitch type: if all the pitches are clustered together, with similar velocities and breaks, calculating the distance from the mean to find the weirdest pitch makes sense. But consider this hypothetical set of pitches, graphed on two axes for simplicity:

hypothetical pitches

Obviously, the pitch that corresponds to the red point is the sort of thing we’d like to identify as unique. It’s also exactly at the center of that dataset, and would show up as the least unique pitch, if distance from the mean was used to determine uniqueness. Luckily, there’s an algorithm that is designed to find outliers in a more rigorous way.

This is where the math gets scary. The algorithm is called Local Outlier Factor analysis, which identifies outliers in a dataset based on the density of data around that point as compared to its neighbors. In this context, the density around a point is a function of how similar the best comps are for each pitch. Each point gets a score, where anything near 1 indicates normal, and higher values indicate greater isolation. I’m not going to go into detail, but if anyone wants to learn more, feel free to ask in the comments, or just Google it. It’s fairly simple to run it on all pitches, with the relevant variables of velocity, horizontal break, and vertical break.

Any pitch thrown more than 100 times in 2014 was included, and righties and lefties were considered separately (since pitches that move the same way obviously are very different based on what side of the rubber they come from). But enough about methodology! Here are the top five most signature pitches, for righties and lefties, along with their LOF scores, followed by some gratuitous gifs.

RIGHTHANDERS

Name Pitch Velocity H.Mov V.Mov Outlier Score
R.A. Dickey Knuckleball 76.6 0.2 1.6 2.26
Mike Morin Change 73.7 2.0 5.7 2.16
Steven Wright Knuckleball 74.2 0.7 0.3 2.13
David Hale Fourseam 91.9 4.2 5.8 2.04
Pat Neshek Change 70.9 7.0 3.5 1.00

LEFTHANDERS

Name Pitch Velocity H.Mov V.Mov Outlier Score
Aroldis Chapman Fourseam 101.2 3.7 11.1 2.53
Erik Bedard Slider 73.6 2.0 4.1 2.19
Sean Marshall Curve 74.4 9.5 -6.7 1.91
Dan Jennings Fourseam 93.6 4.9 5.8 1.86
Zach Britton Sinker 96.2 8.6 4.7 1.85

 

 

Chapman fastball

It’s nice when things work exactly like you expect them to. The top pitches on the two lists are incredible, and incredibly unique, and while it’s not a surprise to see them here, it does provide some reassurance that this measure is doing what it’s supposed to. Everyone knows about Dickey’s knuckleball, and if anything, it’s underrated by this measure. Since it moves so randomly, the knuckle’s season averages end up being slow and pretty much neutral horizontally and vertically. While that’s enough to make them show up as very odd under this measure, the individual pitches don’t often follow that straight trajectory, as seen in the above gif. The same can be said for Steven Wright’s knuckleball in third, but it’s nice that this measure still picks them out as unique pitches.

As for Chapman, there’s not that much to say about his fastball that hasn’t already been said. It feels wrong in some way to call his fastball strange, since it is disturbingly direct in practice, but there was truly no pitch like it in 2014. The velocity is the carrying factor behind the massive outlier score, almost a full 2 MPH greater than the next fastest pitch. Interestingly, Chapman’s pitch was the only one in either top five with notably high velocity.

Looking at the weirdest pitches in baseball, what can we conclude about them as a group? First, the pitchers throwing them are generally not bad. While you’d expect someone to be at least halfway decent to get in the position to throw 100 pitches of a single type, the owners of these pitches averaged about 1 WAR in 2014. With eight of these 10 throwing primarily in relief, and having only 710.2 innings collectively, that comes out to a very respectable 2.4 WAR/200.

The pitches themselves varied in usage, from Neshek’s change, thrown 13.4% of the time, to Britton’s sinker, thrown 89.3% of the time. They also varied in effectiveness, as measured by run values, from Neshek’s 3.6/100 to Marshall’s -1.63/100. Overall, the best pitch is probably Chapman’s fastball, followed by Britton’s sinker, given both the results on those pitches and how often they use them, but as a group, these pitches are pretty good. Maybe that isn’t totally surprising, but weird does not necessarily equal effective. Any pitcher could immediately have the weirdest pitch in baseball, if he threw 40 MPH meatballs, but less absurdly, mix and control matter just as much as the movement of the pitch.

Finally, all this stuff tracks fairly well with what Jeff identified previously. Obviously, he called Dickey and Chapman, but he also wrote this article about how Zach Britton’s sinker is pretty much comp-less, and we see that very pitch in fifth for lefthanders. Odrisamer Despaigne’s change was 12th for righthanders. Interestingly, Carrasco’s change is 98th on that same list, indicating this method doesn’t think he’s incredibly unique. Overall, this was mostly just a fun exercise, but maybe there’s more to this list, so if you want to poke around, it’s in a public Google Doc here. And like I said, if you have any questions about the methodology or anything like that, I’d be glad to answer them in the comments.


Hardball Retrospective – The “Original” 1980 Kansas City Royals

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Consequently, Babe Ruth is listed on the Red Sox roster for the duration of his career while the Orioles claim Eddie Murray and the Cubs declare Lou Brock. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. Additional information and a discussion forum are available at TuataraSoftware.com.

Terminology

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

Assessment

The 1980 Kansas City Royals         OWAR: 42.6     OWS: 272     OPW%: .596

GM Cedric Tallis acquired two-thirds of the ballplayers on the 1980 Royals roster. The organization selected 24 of the 33 players during the Amateur Draft. Based on the revised standings the “Original” 1980 Royals amassed 97 victories and captured the American League pennant by a five-game margin over the Oakland Athletics.

George Brett was batting .337 when he returned to the lineup on July 10 following a month-long absence. “Mullet” went on an absolute tear, collecting 71 hits in 150 at-bats (.473 BA) and driving in 47 runs to boost his average to .401 on August 17. Brett hovered around the elusive .400 mark into the middle of September 1980 before settling for a .390 BA. In addition to securing his second batting title, he recorded personal-bests in RBI (118), OBP (.454) and SLG (.664) while collecting the American League MVP Award. Brett was selected to 13 consecutive All-Star contests (1976-1988), registered 3154 base hits and supplied a .305 career BA.

Fleet-footed left fielder Willie Wilson paced the Junior Circuit with 230 base knocks, 133 runs scored and 15 triples. He earned the Gold Glove Award, manufactured a .326 BA and nabbed 79 bags in 89 attempts after swiping 83 in the previous year. John “Duke” Wathan (.305/6/58) pilfered 17 bases and established a career-high in batting average while shortstop U.L. Washington contributed 11 three-baggers and stole 20 bases.

Outfield chores were handled by Wilson, Ruppert Jones, Clint Hurdle and Al Cowens. Jones backed the club’s baserunning endeavors with 18 stolen bases but otherwise yielded substandard output compared to the 21 home runs and 33 steals from his ’79 campaign. Cowens (.268/6/59) provided further proof that his runner-up finish in the 1977 AL MVP race was an outlier. Hurdle (.294/10/60) drilled 31 doubles and registered personal-bests in virtually every offensive category.

Slick-fielding second baseman Frank “Smooth” White collected six consecutive Gold Glove Awards from 1977-1982 while Rodney “Cool Breeze” Scott purloined 63 bases and legged out 13 three-base hits. Luis Salazar solidified the bench with a .337 BA following his mid-August promotion.

Brett placed second behind Mike Schmidt in “The New Bill James Historical Baseball Abstract” for the best third baseman of All-Time. White (31st) and Wilson (54th) finished in the top 100 at their positions while Dan Quisenberry placed sixty-eighth among pitchers. 

LINEUP POS WAR WS
Willie Wilson LF 7.86 31.52
Frank White 2B -0.08 12.93
George Brett 3B 8.36 36.2
Clint Hurdle RF 1.77 14.01
Al Cowens DH/RF -0.76 10.67
Ruppert Jones CF 0.84 7.2
John Wathan C 2.39 16.49
Ken Phelps 1B -0.06 0.01
U. L. Washington SS 2.1 16.13
BENCH POS WAR WS
Luis Salazar 3B 1.11 7.11
Rodney Scott 2B 0.36 13.18
Jim Wohlford LF 0.36 4.92
Jamie Quirk 3B 0.06 3.47
German Barranca 0 0
Onix Concepcion SS -0.18 0.05
Jeff Cox 2B -0.78 1.32

Dennis Leonard eclipsed the 20-win plateau for the third time in four seasons. Pacing the circuit with 38 starts, Leonard also served up the most gopher balls (30) and earned runs (118) in the American League. Rich Gale (13-9, 3.92), Renie Martin (10-10, 4.39) and Paul Splittorff (14-11, 4.05) provided adequate support in the starting rotation.

The back-end of the bullpen pitched “lights-out” ball for the Royal Blue crew. Dan Quisenberry perplexed the opposition with his unorthodox delivery. “Quiz” tallied 12 victories and topped the leader boards with 33 saves and 75 appearances. Rookie right-hander Doug Corbett (8-6, 1.98) saved 23 contests and finished third in the 1980 AL Rookie of the Year vote. Greg “Moon-Man” Minton added 19 saves and fashioned a 2.46 ERA while Aurelio “Señor Smoke” recorded 13 wins in relief.

ROTATION POS WAR WS
Dennis Leonard SP 3.28 17.1
Rich Gale SP 1.78 10.92
Paul Splittorff SP 1.48 10.31
Renie Martin SP -0.72 5.03
Steve Busby SP -0.61 0
BULLPEN POS WAR WS
Doug Corbett RP 5.8 23.88
Dan Quisenberry RP 2.38 19.09
Greg Minton RP 1.5 12.69
Bob McClure RP 1.42 7.9
Bobby Castillo RP 1.19 9.72
Doug Bird RP 0.82 4.89
Aurelio Lopez RP 0.79 12.85
Mark Souza RP -0.27 0
Craig Chamberlain RP -0.35 0
Mike C. Jones SP -0.41 0
Jeff Twitty RP -0.61 0.06
Mark Littell RP -0.67 0

 The “Original” 1980 Kansas City Royals roster

NAME POS WAR WS General Manager Scouting Director
George Brett 3B 8.36 36.2 Cedric Tallis Lou Gorman
Willie Wilson LF 7.86 31.52 Cedric Tallis Lou Gorman
Doug Corbett RP 5.8 23.88 Cedric Tallis Lou Gorman
Dennis Leonard SP 3.28 17.1 Cedric Tallis Lou Gorman
John Wathan C 2.39 16.49 Cedric Tallis Lou Gorman
Dan Quisenberry RP 2.38 19.09 Joe Burke Lou Gorman
U. L. Washington SS 2.1 16.13 Cedric Tallis Lou Gorman
Rich Gale SP 1.78 10.92 Joe Burke Lou Gorman
Clint Hurdle RF 1.77 14.01 Joe Burke Lou Gorman
Greg Minton RP 1.5 12.69 Cedric Tallis Lou Gorman
Paul Splittorff SP 1.48 10.31 Cedric Tallis Charlie Metro
Bob McClure RP 1.42 7.9 Cedric Tallis Lou Gorman
Bobby Castillo RP 1.19 9.72 Cedric Tallis Lou Gorman
Luis Salazar 3B 1.11 7.11 Cedric Tallis Lou Gorman
Ruppert Jones CF 0.84 7.2 Cedric Tallis Lou Gorman
Doug Bird RP 0.82 4.89 Cedric Tallis Charlie Metro
Aurelio Lopez RP 0.79 12.85 Joe Burke Lou Gorman
Jim Wohlford LF 0.36 4.92 Cedric Tallis Lou Gorman
Rodney Scott 2B 0.36 13.18 Cedric Tallis Lou Gorman
Jamie Quirk 3B 0.06 3.47 Cedric Tallis Lou Gorman
German Barranca 0 0 Joe Burke Lou Gorman
Ken Phelps 1B -0.06 0.01 Joe Burke
Frank White 2B -0.08 12.93 Cedric Tallis Lou Gorman
Onix Concepcion SS -0.18 0.05 Joe Burke
Mark Souza RP -0.27 0 Cedric Tallis Lou Gorman
Craig Chamberlain RP -0.35 0 Joe Burke John Schuerholz
Mike Jones SP -0.41 0 Joe Burke John Schuerholz
Steve Busby SP -0.61 0 Cedric Tallis Lou Gorman
Jeff Twitty RP -0.61 0.06 Joe Burke John Schuerholz
Mark Littell RP -0.67 0 Cedric Tallis Lou Gorman
Renie Martin SP -0.72 5.03 Joe Burke John Schuerholz
Al Cowens RF -0.76 10.67 Cedric Tallis Charlie Metro
Jeff Cox 2B -0.78 1.32 Cedric Tallis Lou Gorman

Honorable Mention

The “Original” 2009 Royals                OWAR: 45.7     OWS: 268     OPW%: .544

Zack Greinke (16-8, 2.16) claimed the 2009 AL Cy Young Award while pacing the League in ERA and WHIP (1.073). Carlos Beltran furnished a .325 BA despite missing all of July and August due to injury. Johnny Damon (.282/24/82) slashed 36 two-base hits and scored 107 runs. Billy “Country Breakfast” Butler clubbed 51 doubles and launched 21 long balls while batting .301.

On Deck

The “Original” 2012 Rays

References and Resources

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database – Transaction a – Executive

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

 


Not Another Wilmer Flores Defense Post

It looks like the New York Mets are going to be entering the season with Wilmer Flores as their shortstop.  Flores has become a polarizing figure among Mets fans for a myriad of reasons, most notably of which would be his defensive capabilities at the position.  Scouts have long held that Flores is not a capable shortstop; however his defensive metrics were pretty good last year!  That being said we know a sample size of one season of defensive metrics is prone to a lot of statistical noise.  And THAT being said we know that Flores played just 443.1 innings at shortstop last season.  Uh oh.  What exactly can we take from that sample size?  How much weight should we place on these defensive metrics for Mr.Flores?

Are the scouts right?  Are the metrics right?  Or is the answer somewhere in between?  (Almost definitely.)

What follows is an exercise which will answer precisely zero of the above questions.  However, I cannot remember a situation quite like this Flores predicament, so I went on a quest (through FanGraphs) to find some comparables.  What shortstops have had the type of defensive metric success Flores has had in such a short sample size, and how have they fared outside of that season?

I looked at a sample of players from 2003-2014 who played from 400 to 500 innings at the position with a UZR/150 from 5 to 19 (Flores was at 12.5).  All these parameters are quite arbitrary, but this whole exercise is quite arbitrary so let’s move along.

This brings us a list of ten seasons excluding Flores.  The seasons are as follows:

 

2014    Jose Ramirez (498.2 Innings, 18.9 UZR/150)

2008    Marco Scutaro (472.1, 17.6)

2008    Maicer Izturis (448, 15.9)

2009    Robert Andino (478.1, 14.1)

2010    Jerry Hairston (489.2, 8.9)

2006    Alex Cora (434, 8.7)

2012    Paul Janish (450.1, 8.6)

2014    Stephen Drew (413.1, 8.1)

2012    John McDonald (426.1, 6.1)

2010    Wilson Valdez (458, 5.2)

What this list of players lacks, is a very poor fielding shortstop.  The lowest career shortstop UZR/150 of the bunch belongs to Mr. Izturis at -3.1 in 1697.1 innings.  This seems to be a list of humans in which you can confidently state “Hey!  None of these players were atrocious major-league defensive shortstops over their careers!”

So what does this mean in regards to Flores?  Basically, nothing.  However, Mets fans can now take solace in knowing that the 10 players (from the last 12 seasons), who had the most similar statistical defensive season to Flores’ 2014, had careers in which they were able to play the shortstop position not horribly.  Now, if Flores himself can play the shortstop position not horribly then the Mets might just have them a nice little player.

Then again, there is always this:


The Horrors of Jackie Bradley Jr.’s 2014 Season

Jackie Bradley Jr. is not a terrible baseball player, and honestly he probably didn’t have a terrible 2014 season. Well at least, it wasn’t as bad as what people perceived. That, however, is due to his impeccable fielding and good baserunning. What follows will include none of that. It is rather a complete and utter breakdown of Bradley’s hitting performance, for 2014, and the trends he displayed. They, as you might have guessed, are not pretty.

First, it seems important to mention that Bradley’s numbers were great in the minors. Not just fielding but hitting as well. After A ball (Greenville), Bradley never had a wRC+ below 120 and he never had a BB% lower than 10%. Now BB% is not always predictive, as Chris Mitchell has displayed through his KATOH metric. KATOH, however, does show that BB% is predictive in AA and AAA, and Bradley’s BB% was good in AA and AAA.

Now on to 2014. This was suppose to be Bradley’s big break, it was supposed to be his year, he was going to replace Jacoby Ellsbury in center, and become the next great Red Sox center fielder. None of that happened; Bradley did play good defense but his offense was atrocious, finishing with a 47 wRC+.

So what happened? How did a player lauded for not just his defense but also his hitting ability, finish the year with a 47 wRC+? First, let’s acknowledge that hitting is extremely difficult, especially at the major-league level. There are also many components that go into hitting and all of them have an impact on a why a hitter hits a certain way. It’s also important to look at how pitchers work a hitter, and I think that’s where will start. Below is a graph, of the hard, breaking, and off-speed pitches Bradley faced in 2014.

lql

From this, it’s pretty evident that pitchers predominantly attacked Bradley with fastballs. This was after all his first major league season, and pitchers will often test young hitters or rookies with fastballs. If the hitter starts to hit the fastball well, then typically a pitcher will make an adjustment. As you can see, no adjustments were made because no adjustments were needed.

Now that we know what pitchers were throwing at Bradley, lets look at what Bradley did with those pitches. The graph below will display the outcome of Bradley’s at-bats in 2014.

poppp

This is where my eyes started to hurt. Bradley, as you can see, got off to a good start, but everything fell off quickly after that. In fact, things fell apart so badly that Bradley didn’t get a single extra base hit in the last two months of the season. While I like this graph, in explaining Bradley’s struggles, I think the pie chart below will give you an even better example of just how bad Bradley was in 2014. The graph was provided by Baseball Savant.

iiiiii

There are many outcomes that can come from a pitch: a foul, a whiff, a called strike, a ball, a ball in play, and finally a hit. Bradley, got a hit considerably less often than any other outcome. This is not a recipe for success. Hold on, let me clarify that. The fact that Bradley’s hits were his most infrequent outcome was not the problem. Mike Trout’s most infrequent outcome after all was his hits. The problem was Bradley’s 4.6 hit%.

Another problem here is that Bradley was simply not putting the ball in play enough, and the balls in play, unfortunately, were not resulting in enough hits (.284 BABIP). This, however, is only one of the problems. To get a better understanding of why Bradley didn’t get enough hits, it seems imperative that we examine where Bradley was hitting the ball. For this, we’ll look at a spray chart provided by Brooks Baseball, to examine exactly where Bradley was hitting the ball, and if there are any consistent trends.

rrrrr
Here are the outcomes when Bradley put the ball in play. What is distinctively clear is that Bradley pulled the ball a lot, especially in the infield. He also doesn’t seem to have been hitting a lot of hard ground balls, which would explain his lack of hits in the infield. As you can see Bradley over a full season of baseball only mustered four hits in the infield and none the other way. The Red Sox have talked about working on Bradley’s swing, they’ve suggested that his swing is too uppercut-y and he needs to start swinging down on the baseball. From this chart it seems pretty evident why they want to do that. They probably want Bradley to be able to hit the ball the other way, not just in the air but also on the ground, as to maximize his ability to get hits.

While fixing a swing is important, it’s only one of the problems. There are more elements that go into hitting and someone doesn’t end up with a 47 wRC+ without some kind of approach problem. This is where we’ll take our final investigation, into Bradley’s plate approach and the tendencies he’s been displaying.

There are a few factors and components that can be attributed to a hitter’s approach. One of them is the hitter’s tendency to swing. The more one swings the less he is likely to be a patient hitter, and the less likely he is to have a good approach at the plate. Below is a graph of Bradley’s month-by-month swing percentage on hard, breaking, and off-speed pitches for 2014.

eeeeee

This as you might be able to tell is not good. Bradley’s tendency to swing got gradually worse as the year went on. This meaning that as the year went on Bradley either got further away from his approach or he simply got frustrated. Let’s not panic, however, just because a hitter has a high swing% doesn’t mean that he can’t be a successful hitter, especially if he makes contact on a lot of his swings. Vlad Guerrero was a great hitter and he swung at everything; he also hit everything. So let’s look at Bradley’s whiffs per swing (whiff/swing). Why? Well because if you’re swinging a lot, you don’t want to have a low whiff per swing rate because it means that most of the pitches you’re swinging at aren’t going to become hits. It also probably means you’re striking out a lot and that you’re chasing a lot of pitches.

ssssss

As you might be able to tell, Bradley in 2014 swung and missed a lot. I think it’s also important to note that in the last two months of the year, Bradley’s plate appearances were significantly reduced. He only got 35 plate appearances in August and only 36 in September. So while it might seem that in the last month, Bradley started swinging and missing less, that was in a very small sample size.

Finally, lets look at Bradley’s overall plate approach tendencies. What follows is a chart provided by Brooks Baseball that examines a players overall plate approach. It examines, through the use of PITCH f/x data his passiveness and his aggressiveness at the plate. It does this through the use of detection theory, which analyses the decisions one makes in face of uncertainty. There are essentially two parameters to detection theory, C and d’. C, which is the one used for this graph, reflects the strategy of the response. Ok, that’s enough on the subject.

tototot

Just like Bradley’s swing tendencies, his overall plate approach was going in the wrong direction. Throughout the year, Bradley had consistently gotten more and more aggressive. He’s essentially lost what made him a successful hitter in the minors. These are the signs that probably made the Red Sox sign Rusney Castillo from Cuba to a seven-year deal. It also might be a reason why the Red Sox are in serious talks with the Braves about a potential trade involving Bradley.

That being said, while it is certain that Bradley’s tendencies and approach were all heading in the wrong direction, this doesn’t mean that he can’t turn things around. Players make adjustments all the time, and I’m not sure that these stats are necessarily predictive of future performance. Baseball after all is a game of adjustments, pitchers make adjustments on hitters, and then the hitters counter with their own adjustments. It doesn’t seem that Bradley will ever be a great hitter or even a good hitter, but what he can be is a league-average hitter. I’ve spent a lot of time discussing Bradley’s offense and not nearly enough on his defense. Bradley is a great defensive center fielder, maybe the best, and that has real value. If Bradley can simply become an average hitter, he should have a spot in the majors for many years to come.

All graphs can be found on Brooks Baseball and the circle graph on Baseball Savant. A lot of the stats can also be found on FanGraphs.  


Delayed Overanalysis of Casey Janssen

The Nats signed reliever Casey Janssen, formerly of the Blue Jays, to a one-year, $5-million contract a few weeks ago (feel free to stop reading now to avoid the existential dread associated with over-analyzing Casey Janssen). Overall, it’s hard not to like this pick-up. One year and five million dollars is basically nothing (except when it comes to signing a second baseman), and the Clippard trade certainly left a hole in the bullpen. There was also a recent stretch of time when Janssen was quite good. From 2011-2013, Janssen averaged 57.1 IP, 8/9 strikeouts per 9 innings, and a sub 3 FIP. WAR isn’t the best way to measure relievers, but he averaged 1.2 WAR a season over those three years, which put him squarely in the pretty damn good category of relief pitchers.

So why did a recently good closer sign for a seemingly below market sum? Because 2014 was mostly terrible. Strikeouts were way down (5.5 Ks/9 in 2014 compared to 8.5 in 2013), Homers were way up (1.2 HR/9 in 2014 compared to 0.5 in 2013), and his groundball percentage dropped from 48% to 34%. These are all fairly alarming trends for a relief pitcher that is 33 and doesn’t throw very hard (2014 average fastball velocity: 89.3 miles per hour). Every analysis for relief pitchers should contain small sample size warnings in all capital letters, but important indicators trending that strongly generally indicate something wrong happening.

In July of last season, Janssen came down with a particularly awful bout of food poisoning, and he probably came back too quickly. And looking at the mid-season splits, there’s a case to be made that it was the (negative) turning point for the rest of Janssen’s season. Let’s compare:

1st half: 22 IP, 1.23 ERA, 0 HR, 14 Ks, 1 BB, .218 wOBA against
2nd half: 23.2 IP, 6.46 ERA, 6 HR, 14 Ks, 6 BBs, .378 wOBA against

In the first half, Janssen made opposing hitters look like Austin Kearns. In the second half, they all looked like Yasiel Puig. His numbers did take a nosedive in July when he was sick, but got worse in August when one would have expected him to be feeling better (or put on the DL to recuperate). It’s impossible for anyone to really know how he was feeling, and if food poisoning actually was the main cause of Janssen’s second-half struggles. But, his velocity didn’t change from the first half to the second half, and his strikeout rate remained about the same. The uptick in walks and home runs in the second half are troubling, but maybe first-half Janssen was a fluke based on a year over year decrease in velocity (lost about .8 MPH on his fastball from 2013 to 2014)  and a decrease in strikeouts. For comparisons sake, here is an unnamed reliever’s 1st and 2nd half splits in 2014:

1st half: 37 IP, .97 ERA, 1 HR, 36 Ks, 11 BBs, .208 wOBA against
2nd half: 25 IP, 6.48 ERA, 3 HR, 23 Ks, 8 BBs, .375 wOBA against

This reliever? Rafael Soriano. There wasn’t an injury narrative to fault for his falling off a cliff bad second half, but he stunk nonetheless. Screwy things can happen in small samples, which is why we try to avoid over-analyzing them. Janssen may have just had impeccable timing, and his new true talent level as a command relief pitcher is that of a 4.00 ERA. But unlike with Soriano, there is a realistic narrative for Janssen that fits the timeline of his struggles. Here’s another 1st half/2nd half comparison

1st half:

2nd half:

While his K rate was basically the same from the first half the second half, these charts show that his whiff rates weren’t. Janssen had much more success both down and up in the zone earlier in the season in terms of swings and misses, so while his velocity was the same between the first half and second half, it appears that his stuff wasn’t.

Again, in such small samples, it’s impossible to draw any definitive conclusions. It’s true that first-half Janssen looked pretty similar to 2011-2013 Casey Janssen, while second-half Janssen looked more like Brian Bruney. It’s reasonable to look at the splits and say that Janssen’s bout with food poisoning ruined what looked to be a promising season. It’s also reasonable to look at his decrease in velocity and strikeout rate and think this was money not well spent. But for a paltry (in the context of the MLB) five million dollars, it’s not that much money anyways, so why the hell not?


The Home Run Derby and Second Half Production: A Meta-Analysis of All Players from 1985 to 2013

The “Home Run Derby Curse” has become a popular concept discussed among the media and fans alike. In fact, at the time of writing, a simple Google search of the term “Home Run Derby Curse” turns up more than 180,000 hits, with reports concerning the “Curse” ranging from mainstream media sources such as NBC Sports and Sports Illustrated, to widely read blogs including Bleacher Report and FanGraphs, to renowned baseball analysis organizations like Baseball Prospectus and even SABR.

This article seeks to shed greater light on the question of whether the “Home Run Derby Curse” exists, and if so, what is its substantive impact. Specifically, I ask, do those who participate in the “Home Run Derby” experience a greater decline in offensive production in comparison to those players who did not partake in the Derby?

Answering this question is of utmost importance to general managers, field managers, and fans alike. If players who partake in the Derby do experience a decline in offensive production between the first and second halves of the season, those in MLB front offices and dugouts can use this information to mitigate potential slumps. Further, if Derby participation leads to a second half slump, fantasy baseball owners can use this knowledge to better manage their teams. Simply put, knowing the effect of Derby participation on offensive production provides us with a deeper understanding of player production.

The next section of this study will address previous literature concerning the “Home Run Derby Curse,” and will discuss how this project builds upon these studies.

Previous Research

Although a good deal of research has been conducted concerning the “Curse,” the veracity of much of this work is difficult to assess. Many of the previous studies on this issue have used subjective analysis of first and second half production of Derby participants in order to assess the effects of the “Curse” (see Carty 2009; Breen 2012; Catania 2013). Although these works have certainly highlighted the need for further research, they are simply not objective enough to definitively address the question of the “Home Run Derby Curse’s” existence.

To date, the most rigorous statistical analysis of the “Curse” is an article by McCollum and Jaiclin (2010), which appeared in Baseball Research Journal. In examining OPS and HR %, McCollum and Jaiclin found a statistically significant relationship between participation in the Derby and a decline in second half production. At the same time, they examined the relationship between first and second half production in years in which players who had previously participated in the Home Run Derby did not participate, and found no statistically significant drop off in production in those years.

At first glance, this appears to be fairly definitive evidence that the “Curse” is real, however, they also found that players’ production in the first half of the season in years in which they participated in the Derby was substantially higher than in those years in which they did not participate in the Derby. This suggests that players who partake in the Derby are chosen because of extraordinary performances. Based on these finding, McCollum and Jaiclin conjectured that the decline in performance after the Derby for those who participated is likely due to the fact that their performance was elevated in the first half of the season, and the second half decline is simply regression to the mean.

Despite the strong statistical basis of McCollum and Jaiclin’s work, there are a number of points in this work that need to be addressed. First, McCollum and Jaiclin only examine those players who have participated in the Derby and have at least 502 plate appearances in a season, thus prohibiting direct comparison with those who did not participate in the Derby. At the heart of the “Home Run Derby Curse,” however, is the idea that participants in the Derby experience a second half slump greater than is to be expected of any player.

The question that derives directly from this conception is, do Derby participants experience a slump greater than is to be expected from players who did not participate in the Derby? To sufficiently answer this question, players who participated in the Derby must be compared to those who did not. Due to a methodology that relies upon data of only Derby participants, Jaiclin and McCollum were unable to sufficiently answer this question.

Second, McCollum and Jaiclin use t-tests to test their hypotheses. This method is a strong objective statistical approach, however, it is not ideal, as it does not allow for the inclusion of control variables. Thus, there may be additional factors affecting both the relationship between Derby participation and second half production simultaneously, creating a spurious finding. This problem can only be addressed through multivariate regression.

The final issue with McCollum and Jaiclin’s work centers on their theoretical expectations and their measures of offensive production. Theoretical extrapolation is absolutely necessary in statistical work as it informs analysis. Without theoretical expectation, researchers are simply guessing at how best to measure their independent and dependent variables. Very little theoretical explanation of the “Curse” has put forth in previous work on the “Curse,” including McCollum and Jaiclin’s piece, and therefore, their measurement of offensive output is not necessarily best.

This article is an attempt to build upon previous work concerning the “Home Run Derby Curse,” and to address the above issues. In the next section, I will develop a short theoretical framework concerning the “Curse.” Four hypotheses are then derived from this theory, which are tested using expanded data, and a different methodological approach. The results suggest that a “Curse” does exist.

Theoretical Basis of the Home Run Derby Curse

Two main theories have been posited to explain the “Home Run Derby Curse.” First, it has been suggested that participation in the Derby saps players of energy that is necessary to continue to perform well in the second half of the season (Marchman 2010). This theory is summarized well by Marchman who focused on the particular experience of Paul Konerko who went deep into the 2002 Home Run Derby in Milwaukee. He wrote:

The strange experience of taking batting practice without a cage under the lights in front of tens of thousands of people left him sore in places he usually isn’t sore, like his obliques and lower back, the core from which a hitter draws his power. Over the second half of the year, he hit just seven home runs, and his slugging average dropped from .571 to .402.

In essence, this theory argues that players who participate in the Derby experience muscle fatigue in those muscles from which power hitting is drawn. Because these fatigued muscles are imperative for power hitting, players who participated in the Derby experience reduced power, and see a drop in power numbers in the second half of the season. Thus, one can hypothesize:

H1.1 Players who participate in the Derby will see a greater decline in their power numbers than players who do not participate in the Derby.

Furthermore, one might expect that a player will experience greater decrease in energy the more swings he takes during the Derby. The logic underpinning this assertion is that using the power hitting muscles for a longer period of time should fatigue them to a greater extent. Thus, a player who takes 10 swings in the Derby should experience less muscle fatigue than a player who takes 50 swings in the Derby. Following this line of reasoning, one should expect that the Derby has a greater effect on players’ second half power hitting performance when they take more swings during the Derby. Since those players who hit more home runs during the Derby take more swings, it can also be hypothesized:

H1.2: Players who hit more home runs in the Derby will see a greater decline in power numbers in the second half of the season than players who hit fewer home runs in the Derby (including those who do not participate).

The second theory of the “Curse” proposes that participation in the Derby leads to players altering their swings (Breen 2011; Catania 2013). It is thought that this altered swing carries over into the second half of the season, affecting players’ offensive output.

Although most studies of the “Curse” rarely delve into how players tweak their swings, it is likely safe to assume that they are changing their approach in the hope of belting as many homers as possible for that one night – developing an even greater power stroke. It is a commonly accepted conjecture that power and strikeouts are positively correlated (see Kendall 2014), meaning that greater power is associated with more strikeouts. This conjecture may not be as true for exceptionally talented players (i.e. Hank Aaron, Ted Williams, Mickey Mantle, Willie Mays, etc.) However, if we accept this assumption to be correct for the majority of players, it can be stated that if players change their swing to hit more home runs, they should see a corresponding increase in strikeouts in the second half of the season.[i] Thus, it can be hypothesized:

H2.1: Players who participate in the Home Run Derby will experience greater strikeouts per plate appearance in the second half of the season than those who did not participate in the Derby.

As with hypotheses 1.1 and 1.2, it can also be assumed that the effect of participation in the Derby will be greater the more swings an individual takes during the Derby. That is to say, if a player hits more home runs during the Derby, the altered swing he uses during the Derby will be more likely to carry through to the second half of the season. This leads to the hypothesis:

H2.2: Derby participants who hit more home runs in the Derby will experience greater strikeouts per plate appearance in the second half of the season than those who hit fewer home runs in the Derby (including those who do not participate).

In the next section of this study I will discuss the analytical approach, variable operationalization, and the data sources used to address the above four hypothesis.

Data and Analytical Method

Below I will begin with a discussion of the data used in this study. I will then discuss the independent and dependent variables for each hypothesis as well as the control variables used in this study. Finally, I will discuss the methodological approach used in this study.

Data Sources and Structure

Above, it is hypothesized that those players who either participated in the Derby, or performed well in the Derby will see greater offensive decline between season halves than those who either did not participate in the Derby, or struggled in the Derby. In order to properly test these hypotheses one must use data that includes those who participated in the Home Run Derby, and those who did not participate in the Derby.

This paper performs a meta-analysis of all players with at least 100 plate appearances in both the first and second halves of the season from 1985 (the first year in which the Home Run Derby was held) through 2013. This makes the unit of analysis of this paper the player-year. This data excludes observations from 1988 as the Derby was cancelled due to rain. Further, 1994 is also excluded as the second half of the season was cut short due to the players’ strike.

Independent Variables

The main independent variable for hypotheses 1.1, and 2.1 is a dichotomous measure of participation in the Home Run Derby. A player was coded as a 1 if they participated in the Derby and a 0 if they did not participate in the Derby. Between 1985 and 2013 a total of 229 player-years were coded as participating in the Derby.

The independent variable for hypotheses 1.2, and 2.2 is a measure of each player’s success in the Home Run Derby. This is an additive variable denoting the number of home runs each player hit in each year in the Derby. This variable ranges from 0 to 41 (Bobby Abreu in 2005). Those who did not participate in the Derby were coded as 0.[ii]

Dependent Variables

Hypotheses 1.1 and 1.2 posit that participation in the Derby and greater success in the Derby will lead to decreased power numbers respectively. Power hitting can be measured in numerous ways, the most obvious being home runs per plate appearance (HRPA). However, theoretically, if players are being sapped of energy, this should affect all power numbers, not simply HRPA. Restricting one’s understanding of power to HRPA ignores other forms of power hitting, such as doubles and triples. So as not to remove data variance unnecessarily, one can measure change in power hitting by using the difference between first and second half extra base hits per plate appearance (XBPA) rather than HRPA.

Thus, the dependent variable for hypotheses 1.1 and 1.2 is understood as the difference between XBPA in the first and second halves of the season for each player-year. XBPA is calculated by dividing the number of extra base hits (doubles, triples, and home runs) a player has hit by the number of plate appearances, thus providing a standardized measure of extra base hits for each player-year.

The dependent variable for hypotheses 1.1 and 1.2 was created by calculating the XBPA for the first half of the season, and the second half of the season for each player-year. The XBPA for the second half of the season for each player-year was then subtracted from the XBPA for the first half of the season for each player-year. Theoretically, this variable can from -1.000 to 1.000. In reality this variable ranges from -.116308 to .1098476, with a mean of -.0012872, and a standard deviation of .025814.

The dependent variable for hypotheses 2.1 and 2.2 is the difference between first and second half strikeouts per plate appearance (SOPA) for each player-year. SOPA is calculated by dividing the number of strikeouts a player has by his plate appearances, thus providing a standardized measure of strikeouts for each player-year.

The dependent variable for these hypotheses was created by calculating the SOPA for the first half of the season, as well as the second half of the season for each player-year. The SOPA for the second half of the season for each player-year was then subtracted from the SOPA for the first half of the season for each player-year. Theoretically, this variable can from -1.000 to 1.000. In reality this variable ranges from -.1857143 to .1580312, with a mean of -.003198, and a standard deviation of .0378807.

Control Variables

A number of control variables are included in this study. A dummy variable denoting whether a player was traded during the season is included.[iii] To control for the possible effects of injury in the second half of the season, a dummy variable denoting if a player had a low number of plate appearances in the second half of the season is included.[iv] Further, I include a dummy variable measuring whether a player had a particularly high number of first half plate appearances.[v] Finally, controls denoting observations in which the player played the entire season in the National League,[vi] observations that fall during the “Steroid Era,”[vii] observations that fall in a period in which “greenies” were tolerated,[viii] and observations that fall during the era of interleague play are included.[ix]

Analytical Approach

The main dependent variables used to test the above hypotheses are the difference between first and second half XBPA, and the difference between first and second half SOPA. For each of these variables, the data fits, almost perfectly, a normal curve.[x] For each of these variables, the theoretical range runs from 1 to -1, with an infinite number of possible values between. Although these variables cannot range from infinity to negative infinity, the most appropriate methodological approach for this study is OLS regression.

In the next section of this piece, I will report the findings of the tests of hypotheses 1.1 through 2.2. I will then discuss the implications of these findings.

Analysis

This section will begin with the presentation and the discussion of the findings concerning hypotheses 1.1 and 1.2. I will then present and discuss the findings of tests of hypothesis 2.1 and 2.2.

Analysis of Hypotheses 1.1 and 1.2

Column 1 of table 1 shows the results of the test of hypothesis 1.1. The intercept for this test is   -.0007, but is statistically insignificant. This suggests that, with all variables included in the test held at 0, players will see no change in their XBPA between the first and second halves of the season. The coefficient for the “Derby participation” variable shows a statistically significant coefficient of .008. This means, if a player participates in the Derby, he can expect to see his second half XBPA drop by .008.

Of course, there is a possibility that those who participate in the Derby will see a greater drop in their XBPA than the average player because, in order to be chosen for the Derby, a player will have a higher XBPA.[xi] This would then make it more likely that players who participate in the Derby see a greater drop in XBPA than players who do not participate as they regress to the mean. To account for this, the sample can be restricted to players with a high first half XBPA.

The mean first half XBPA for all players (those who do and do not participate in the Derby) between 1985 and 2013 is .0766589. The sample is restricted to only those players above this mean. This is done in for the tests displayed in column 2 of table 1. As can be seen, the intercept is statistically significant, with a coefficient of .01. Those who have average or above average XBPA in the first half of the season can expect to see their XBPA drop by .01 after the All-Star Break when all other variables in the model are held equal.

Table 1: The Effect of Home Run Derby Participation on the Difference in XBPA.

  Full Sample XBPA > .0766589 XBPA > .1138781
Derby Participation .008***(.002) .002(.002) -.004(.003)
Trade .001(.001) .002(.002) .007(.006)
Diminished PAs -.005***(.001) -.002**(.001) .002(.001)
High 1st Half ABs -.001(.001) -.006***(.001) -.008**(.003)
National League .0003(.001) -.001(.001) .0002(.002)
Steroids -.001(.002) -.001(.002) .004(.005)
Greenies .004**(.002) .003(.002) .003(.002)
Interleague .002(.002) .00003(.002) -.01(.005)
Intercept -.0007(.002) .01***(.002) .04***(.005)
N 7,330 3,904 636

Note: Values above represent unstandardized coefficients, with standard errors in parentheses. *p<.05, **p<.01, ***p<.001

Turning to the Derby participation variable, one notices that it is now statistically insignificant with a coefficient of .02. When restricting the sample to only those who showed average or above average power in the first half of the season, the results show that those who participate in the Derby will see no statistically discernible difference in their power hitting when compared to those who did not participate in the Derby.

The variables denoting if a player had a low number of plate appearances in the second half of the season, or a high number of at-bats in the first half of the season, are statistically significant and both present with negative coefficients. Meaning if a player has a high number of at-bats in the first half of the season or if he has a low number of plate appearances in the second half of the season, he will actually see an increase in XBPA.

Although the results in column 2 of table 1 are telling, it may be useful to restrict the sample even further. Those who are selected for the Derby are, for all intents and purposes, the best power hitters in baseball. Therefore, one can restrict the sample to only the best power hitters and compare only those players with a first half XBPA equal to, or above the average for those who participated in the Derby, while, of course, keeping all Derby participants in the sample.

The mean first half XBPA for Derby participants between 1985 and 2013 is .1138781. Tests restricting the sample to only those with a first half XBPA of .1138781 are displayed in column 3 of table 1. The intercept for these tests is statistically significant and shows a coefficient of .04. Meaning those with a first half XBPA of .1138781 can expect to see a drop of .04 in their XBPA after the All-Star Break. The coefficient for the variable measuring participation in the Derby is -.004, but is statistically insignificant. This suggests that those who participate in the Derby do not see a marked decrease in their power hitting after the Derby when compared to those of similar power hitting prowess.

The only variable that shows a statistically significant effect in these tests is that which denotes whether a player had a high number of first half at-bats. As with previous tests, the coefficient for this variable is negative. This suggests that players who have a high number of first half at-bats see an increase in the XBPA between the first and second halves of the season in comparison to those without a high number of first half at-bats.

Columns 1 through 4 in table 2 show the tests of success in the Derby (the number of home runs hit) on the difference between first and second half XBPA. The test with a full sample is displayed in column 1 of table 2. The intercept for this test is statistically insignificant, suggesting that on average, players do not experience a marked change in their XBPA between the first and second halves of the season.

The variable denoting the number of home runs a player hit during the Derby is statistically significant and has a coefficient of .0003. This means that for every home run a participant in the Derby hit he can expect his XBPA after the All-Star Break to decline by .0003 points.

Of course, the relationship between Derby success and the difference in first half and second half XBPA is not likely to be linear, but rather curvilinear. Thus, a measure of home runs hit during the Derby squared should be included. The test including this variable is displayed in column 2 of table 2. The intercept is again statistically insignificant suggesting that when all variables in the model are held at 0, players should not see a marked change in their XBPA between the first and second half of the season.

Table 2: The Effect of the # of Home Runs Hit in the Derby on the Difference in XBPA.

  Full Sample Restricted Samples
Without HR^2 With HR^2 XBPA>.0766589[xii] XBPA>.1138781[xiii]
Home Run Total .0003*(.0002) .001**(.0004) .00002(.0002) -.00002(.0002)
Home Runs Squared . -.00004*(.00002) . .
Trade -.001(.001) -.001(.001) .002(.002) .007(.002)
Diminished PAs -.005***(.001) -.005***(.001) -.002**(.002) .002(.002)
1st Half ABs .001(.001) .001(.001) -.006***(.001) -.008***(.002)
National League .0003(.001) .0003(.001) -.001(.001) .0002(.002)
Steroids -.001(.002) -.001(.002) -.001(.002) .004(.005)
Greenies .004**(.002) .004**(.002) .003(.002) -.003(.005)
Interleague .002(.002) .002(.002) -.00003(.002) -.009(.005)
Intercept -.001(.002) -.001(.002) .01(.002) .04***(.006)
N 7,330 7,330 3,898 636

Note: Values above represent unstandardized coefficients, with standard errors in parentheses. *p<.05, **p<.01, ***p<.001

 

The effect of success in the Derby remains statistically significant with a coefficient of .001. This means that with each home run a player hits in the Derby his XBPA in the second half of the season will decline by .001. Further, the variable “home runs squared” is statistically significant, and has a coefficient of -.00004. This indicates that the effect of the number of home runs a player hits in the Derby on second half production decreases with more home runs. In essence, hitting 40 home runs during the Derby does not have the same effect on second half offensive production as hitting 30 home runs during the Derby, and so on.

In terms of control variables, the variable denoting a high number of first half at-bats is statistically significant with a negative coefficient in the tests reported in column 1 of table 2. Further, in the tests reported in column 2 the variable denoting a diminished number of plate appearances in the second half of the season is statistically significant and negative.

As with the tests reported in table 1, restricting the sample to only those players with average and above average first half XBPA may be useful. Column 3 of table 2 shows the results of the test of the effect of success in the Derby on the difference in XBPA between the two halves of the season when the sample is restricted to those with a first half XBPA at or above the leagues’ average (.0766589).

The intercept in this test is statistically significant with a coefficient of .012. This means that, when all variables included in this test are held at 0, players with an average or above average first half XBPA notice a decline in the second half XBPA. Importantly, the effect of the number of home runs hit during the Derby is statistically insignificant, meaning that hitting more home runs during the Derby has no statistical effect on the difference between first half and second half XBPA when the sample is restricted to those with average or above average first half XBPA.

Both the variable denoting whether a player had diminished second half plate appearances, and the variable denoting whether a player had a high number of first half at-bats, are statistically significant with negative coefficients. This implies that those who experience a diminished number of second half plate appearances, and those players with a high number of first half at-bats see an increase in their XBPA between the first and second halves of the season.

Column 4 of table 2 restricts the sample based on the mean first half XBPA of those who participated in the Derby. The mean first half XBPA of these players is .1138781. The intercept for this test is statistically significant with a coefficient of .036. The variable measuring success in the Derby is statistically insignificant, meaning that the number of home runs a player hits in the Derby has no statistical effect on the difference between first and second half XBPA when comparing Derby participants to similar power hitters who did not participate in the Derby.

In terms of controls, the variable denoting whether a player had a high number of first half at-bats is again statistically significant with a negative coefficient. This, as with previous tests, suggests that those who have a high number of first half at-bats will experience an increase in XBPA between the first and second halves of the season.

Analysis of Hypotheses 2.1 and 2.2

The results of the tests of hypothesis 2.1 (participation in the Derby will lead to more strikeouts per plate appearance) are displayed in column 1 of table 3. This column shows the relationship between participation in the Derby and the change in SOPA between the first and second halves of the season. As can be seen, the intercept is -.006 and is statistically significant, meaning, all other things equal, players strikeout more often in the second half of the season.

The coefficient for “Derby participation” is -.005 and is statistically significant, meaning that those who participate in the Home Run Derby will see their second half SOPA increase by .005 between halves of the season in comparison to players who do not participate in the Derby. When one takes into account that SOPA should increase by .006 when all other variables are held at 0, this finding suggests that Derby participants should see an increase of .011 in their SOPA between the first and second halves of the season.

Unlike XBPA, there is very little chance that SOPA is associated with selection for the Home Run Derby. Moreover, the average first half SOPA for the entire sample used in this study is .1610312, whereas the mean first half SOPA for those who participated in the Home Run Derby is .1669383. Those who participated in the Derby were actually more likely to strikeout in any given plate appearance than those who did not participate in the Derby. Essentially, assuming that one should see a regression to the mean, it is more likely that those who participate in the Derby would see a decrease in SOPA between the first and second halves of the season. These results, however, tell the opposite story, and cannot be explained by a mere statistical anomaly. Therefore, it is unnecessary to restrict the sample, and one can state that hypothesis 2.1 is supported.

Turning to column 2 of table 3, one sees a test of hypothesis 2.2 (the more home runs a player hits in the Derby the smaller the difference between his first and second half SOPA will be). Much like the results in column 1 of table 3, the intercept is -.006 and is statistically significant. Thus, when holding all other variables at 0, one can expect the difference between a player’s first and second half SOPA to increase by .006.

The coefficient for the variable denoting the total number of home runs a player hit during the Derby shows a statistically significant coefficient of -.0005. For every home run a player hit during the Derby, the difference between his first half SOPA and second half SOPA will decrease by .0005.

This relationship, however, is likely curvilinear. In order to account for this likelihood I include a variable in which the total number of home runs a player hit during the Derby is squared. Column 3 of table 3 reports the results of a test including a measure of the total number of home runs squared. When this variable is included the coefficient of the total number of home runs becomes statistically insignificant. This suggests that the total number of home runs a player hit during the Derby is not related to the difference in that player’s first half SOPA and second half SOPA. It must be noted, however, that this finding does not negate the result in column 1 of table 3.

Interestingly, there are a number of control variables that show statistical significance in all tests included in table 3. If a player was traded midseason, he can expect to see the difference between his first half SOPA and second half SOPA to shrink, meaning he will see an increase in SOPA in the second half of the season. Further, having a below average number of plate appearances in the second half of the season leads to a decrease in second half SOPA.
class=Section9>

Table 3: The Effect of Home Run Derby Participation and the # of Home Runs Hit in the Derby on the Difference in SOPA.

  Participation in Derby (1=Participation) Number of Home Runs Hit in Derby
. Without HR^2 With HR^2
Derby Participation/Home Run Total -.005*(.003) -.0005*(.0002) -.00001(.0002)
Home Run Total Squared . . -.00002(.00002)
Trade -.004*(.002) -.004*(.002) -.005*(.002)
Diminished PAs .005***(.001) .005***(.001) .005***(.001)
High 1st Half ABs .002*(.001) .002*(.001) .002(.001)
National League .002(.001) .002(.001) .002(.001)
Steroids .002(.002) .002(.002) .002(.002)
Greenies .0003(.002) .0003(.002) .0004(.002)
Interleague -.002(.002) -.001(.002) -.001(.002)
Intercept -.006*(.003) -.006*(.003) -.007*(.003)
N 7,330 7,330 7,330

Note: Values above represent unstandardized coefficients, with standard errors in parentheses. *p<.05, **p<.01, ***p<.001

The next section of this paper will first place the main findings of this paper in a broader context of the “Home Run Derby Curse.” It will then discuss possible avenues for further research.

Implications

The results above were mixed. In some instances, participation in the Derby, or success in the Derby, was statistically related to second half offensive decline, whereas in other tests, there was no relation between participation in the Home Run Derby and changes in offensive production between season halves. When using the full sample (N=7,330), the results showed that Derby participants can expect to see a greater drop in their XBPA between halves of the season than those who did not participate in the Derby. Moreover, those who have greater success in the Derby will see a greater drop in their XBPA between the first and second halves of the season in comparison to those who have not had as much success in the Derby. Further, the results showed, when using the full sample of players, those who participated in the Derby, as well as those who had greater success in the Derby, will, on average, expect to see their second half SOPA increase more than Derby non-participants.

These findings, however, must be discussed in closer detail. As McCollum and Jaiclin (2010) pointed out in their piece, some of these results may be due to the often extraordinary performances of Derby participants in the first half of the season, and any decline is simply a regression to the mean.

In order to address this issue, in testing the effect of Derby participation and success on change in XBPA, I restricted the sample to those who showed above-average and extraordinary performances in the first half of the season. The effect of Derby participation and success on change in XBPA disappeared when the sample was restricted to those who showed average or above-average first halves. This suggests that hypotheses 1.1 and 1.2 are not confirmed, and lends support to McCollum and Jaiclin’s regression to the mean conjecture.

Turning towards the relationship between Derby success and change in SOPA between halves, an effect was initially found. This suggests that those who hit more home runs during the Derby tend to see an increase in their second half SOPA in comparison to their first half SOPA. This relationship, however, evaporates when a measure of home runs squared in included. This suggests a lack of robustness to this finding, and thus hypothesis 2.2 cannot be confirmed.

Based upon these findings, it appears that the “Home Run Derby Curse” is more of a Home Run Derby myth. The results concerning Derby participation and SOPA, however, appear to tell a different story. The test of hypothesis 2.1 shows that those who participate in the Derby see a larger increase in their SOPA between halves of the season compared to players who do not participate in the Derby. As stated above, it is unnecessary to restrict the sample based upon first half SOPA because those who participate in the Derby have, on average, a higher first half SOPA than the full sample mean. Thus, the argument that Derby participants have had an exceptionally strong first half does not apply in the case of SOPA.

Simply put, derby participants do see a statistical increase in their SOPA in comparison to non-participants, suggesting that there is some credence to the “Home Run Derby Curse,” and, it is caused by players changing their swings. The question that remains, however, is what is the substantive impact of participation in the Derby on SOPA?

The Substantive Effect of Derby Participation on SOPA

Essentially, Derby participants can expect to see their second half SOPA increase by .005 more points over their first half SOPA than those players who do not participate in the Derby. The average first half SOPA for those who did not participate in the Derby is .1608855. The mean number of first half plate appearances for the sample used in this study, excluding those who participated in the Derby, is 249.5692. This means that the average Derby non-participant will strikeout about 41 times in the first half of the season.

With all variables in the model held equal, the average second half SOPA will be .006 points higher than the average first half SOPA, about .1668855. The mean number of second half plate appearances for the sample used in this study, excluding those who participated in the Home Run Derby, is 219.5873. Therefore, an average Derby non-participant will strikeout about 37 times in the second half of the season. When the first half and second half are combined, an average player who did not participate in the Home Run Derby can expect to strikeout about 78 times.

The mean first half SOPA for a Derby participant is .1669383. The average number of first half plate appearances for Derby participants is 356.9345. Thus, the average Derby participant can expect to strikeout about 60 times in the first half of the season.

The mean second half SOPA for a Derby participant can be understood as:

2nd Half SOPA = .1669383+(-α)+(-β)

Where α is the intercept (-.006) and β (-.005) is the coefficient for participation in the Derby. All other variables held constant at 0, Derby participants can expect a second half SOPA of .1779383. The average number of second half plate appearances for Derby participants is 291.7209. With all variables other than “Derby participation” held equal at 0, those who participate in the Derby can expect about 52 strikeouts in the second half of the season. This suggests that a Derby participant can expect to strike out 112 times during the season.

This is a substantial difference in strikeouts, however, in order to accurately assess the true substantive effect of Derby participation, one must utilize a common number of plate appearances across both Derby participants and non-participants alike. For the purposes of this paper, I make the reasonable assumption that players will have about 300 plate appearances in the second half of the season.

Using 300 plate appearances, those who did not participate in the Derby can expect 50 strikeouts in the second half of the season, whereas those who did participate in the Derby can expect 53 strikeouts in the second half of the season.[xiv] This difference of three strikeouts does not seem substantively large.

Further, it must be noted that the coefficient for Derby participation of -.005 is only an estimate with a 95% confidence interval ranging from -.0002 to -.01. If the true coefficient is -.01, this would amount to about 5 more strikeouts over 300 plate appearances in the second half of the season. If the true coefficient is -.0002, a player who participates in the Derby could expect, all other things held equal, to strikeout 2 more times over 300 plate appearances in the second half of the season than a player who did not participate. In essence, the difference in SOPA between the halves of the season due to participation in the Derby is statistically significant, but substantively negligible.

Broader Implications and Future Research

Although the effects of Derby participation on SOPA are substantively minimal, the take away point of this study is that a “Home Run Derby Curse” does exist. Further, the confirmation of H2.1 suggests that Derby participants are altering their swings to develop more power during the Derby, and this is affecting their swing in the second half of the season.

Regardless of the substantive effects, this is an important finding. If a Derby participant’s swing is altered so greatly that they begin striking out at an even faster rate than non-participants in the second half of the season, the question that we must ask is, what other effects does this altered swing have? Does it increase a Derby participant’s flyball ratio? Are Derby participants more likely to see a drop in batting average and walks?

Beyond these questions, future research into the “Curse” should also focus on how the Derby alters a player’s swing. One possible avenue for future research lies in measuring changes in a hitter’s stance (i.e. distance between their feet, angle of their back elbow, etc.) after the Derby relative to a player’s stance prior to the Derby.

Works Cited:

 J.P. Breen, “The Home Run Derby Curse,” FanGraphs, July 11, 2012, accessible via http://www.fangraphs.com/blogs/the-home-run-derby-curse/.

Jason Catania, “Is there Really a 2nd-Half MLB Home Run Derby Curse?,” Bleacher Report, July 15, 2013, accessible via http://bleacherreport.com/articles/1702620-is-there-really-a-second-half-mlb-home-run-derby-curse.

Derek Carty, “Do Hitters Decline After the Home Run Derby?,” Hardball Times, July 13, 2009, accessible via http://www.hardballtimes.com/do-hitters-decline-after-the-home-run-derby/.

Evan Kendall, “Does more power really mean more strikeouts?,” Beyond the Box Score, January 12, 2014, accessible via http://www.beyondtheboxscore.com/2014/1/12/5299086/home-run-strikeout-rate-correlation.

Tim Marchman, “Exploring the impact of the Home Run Derby on its participants,” Sports Illustrated, July 12, 2010, accessible via http://sportsillustrated.cnn.com/2010/writers/tim_marchman/07/12/hr.derby/.

Joseph McCollum, and Marcus Jaiclin. 2010. “Home Run Derby Curse: Fact or Fiction?,” Baseball Research Journal 39(2).

[i] It could be argued that those players who participate in the Derby are also exceptional players, and therefore, this conjecture will not be correct for the majority of those who participate in the Derby. At first glance, this would appear to create a problem for the analysis in this piece, however, this is not so. This would only present a problem if being exceptional led to a greater positive correlation between a power swing and strikeouts, that is if a power stroke for exceptional players leads to more strikeouts than a power stroke for average players. If exceptional hitters are less likely to have a positive correlation between power and strikeouts, and Derby participants are exceptional players, we would expect to see a lower strikeout rate among these players when they begin attempting to hit for greater power. Essentially, a violation of this assumption leads to a more conservative measurement.

[ii] Some players who participated in the Derby were coded as “0,” as they did not hit any home runs.

[iii] A player is coded as a 1 if he was traded during the season and a 0 if he was not traded.

[iv] A player is coded as a 1 if the difference in his plate appearances between the first and second halves of the season (Pre All-Star Break PAs – Post All-Star Break PAs) is greater than the observed average in the data (39.8). A player is coded as a 0 if the difference in his plate appearances between the first and second halves of the season is less than the observed average in the data.

[v] A player is coded as a 1 if the number of plate appearances he had during the first half of the season is greater than the observed average in the data (342).

[vi] This variable is a dummy variable, with a player being coded as a 1 if he spent the entire season in the National League, and a 0 if he did not spend the entire season in the National League.

[vii] Although the “Steroid Era” is somewhat difficult to nail down, for the purposes of this paper, it is assumed to run from 1990 through 2005. Therefore, if an observation is in or between 1990 and 2005 it is coded as a 1. If an observation falls outside of this time period it is coded as a 0.

[viii] For the purposes of this paper, the era of “greenies” is deemed to run from 1985 through 2005. Therefore, if an observation is in or between 1985 and 2005 it is coded as a 1. If an observation falls outside of this time period it is coded as a 0.

[ix] Interleague play began in 1995 and continues through present. Therefore, if an observation is in or between 1995 and 2013 it is coded as a 1. If an observation falls outside of this time period it is coded as a 0.

[x] The difference in XBPA variable maintains the same basic distribution when the sample is restricted to those with an XBPA equal to or greater than the league average (.0766589), as well as equal to or greater than the Derby participant average (.1138781).

[xi] The mean first half XBPA for those who participate in the Derby is .114, whereas the mean first half XBPA for those who do not participate in the Derby is .077.

[xii] When this model is run including a variable for “home runs squared” the results remain similar.

[xiii] When this model is run including a variable for “home runs squared” the results remain similar.

[xiv] One could quibble with the estimate of 300 second half plate appearances, however, it is important to note that a Derby participant’s second half strikeout total increases over a non-participants strikeout total by .5 for every 100 plate appearances. Thus, if one were to use 200 plate appearances, the difference in average strikeout totals between Derby participants and non-participants for the second half of the season would be about 2.5. Additionally, if one were to use 400 plate appearances, the difference in average strikeout totals between Derby participants and non-participants for the second half of the season would be about 3.5.


The NL West: Time Zones, Ballparks, and Social Investing

I think the National League West is the most idiosyncratic division in baseball. Note that I avoided a more disparaging term, like odd or weird. That’s not what I’m trying to convey. It’s not wrong; it’s just…off. Not bad–it’s home to 60% of the last five World Champions, right?–but different. Let me count the ways. (I get three.)

Time zones

EAST COAST BIAS ALERT!

It is difficult for people in the Eastern time zone to keep track of the NL West. Granted, that’s not the division’s fault. But 47% of the US population lives in the Eastern time zone. Add the Central, and you’re up to about 80%. That means that NL West weeknight games generally begin around the time we start getting ready for bed, and their weekend afternoon games begin around the time we’re starting to get dinner ready. The Dodgers, Giants, and Padres come by it naturally–they’re in the Pacific time zone. The Diamondbacks and Rockies are in the Mountain zone, but Arizona is a conscientious objector to daylight savings time, presumably to avoid prolonging days when you can burn your feet by walking barefoot outdoors. So effectively, four teams are three hours behind the east coast and the other team, the Rockies, is two hours behind.

Here’s a list of the number of games, by team, in 2015 that will be played in each time zone, ranked by the number of games in the Mountain and Pacific zones, counting Arizona among the latter:

Again, I’m fully on board with the idea that this is a feature, not a bug. But it’s a feature that means that a majority, or at least a solid plurality, of the country won’t know, for the most part, what’s going on in with the National League West teams until they get up in the morning.

Ballparks

OK, everybody knows that the ball flies in Coors Field, transforming Jose Altuve to Hack Wilson. (Check it out–they’re both 5’6″.) And the vast outfield at Petco Park turns hits into outs, which is why you can pencil in James Shields to lead the majors in ERA this year. But the other ballparks are extreme as well: Chase Field is a hitter’s park; Dodger Stadium and AT&T Park are pitchers’ havens. The Bill James Handbook lists three-year park factors for a variety of outcomes. I calculated the standard deviations for several of these measures (all scaled with 100 equal to league average) for the ballparks in each division. The larger the standard deviation, the more the ballparks in the division play as extreme, in one direction or the other. The NL West’s standard deviations are uniformly among the largest. Here’s the list, with NL West in bold:

  • Batting average: NL West 10.1, AL West 7.2, AL Central 6.5, AL East 5.8, NL East 5.2, NL Central 1.6
  • Runs: NL West 26.5, NL Central 7.9, NL East 6.9, AL East 4.0, AL Central 2.8, AL West 2.7
  • Doubles: AL East 20.3, NL West 11.3, NL East 6.2, NL Central 5.9, AL Central 5.1, AL West 2.9
  • Triples: NL West 50.6, AL Central 49.5, NL East 33.6, AL West 28.3, AL East 27.8, NL Central 11.1
  • Home runs: NL Central 30.2, NL West 23.9, NL East 20.0, AL East 18.7, AL Central 11.3, AL West 11.2
  • Home runs – LHB: NL Central 31.6, AL East 27.4, NL West 25.6, NL East 21.7, AL West 14.7, AL Central 11.7
  • Home runs – RHB: NL Central 32.1, NL West 24.0, NL East 20.0, AL East 14.4, AL Central 13.6, AL West 10.2
  • Errors: AL East 17.7, NL West 12.2, NL Central 11.6, NL East 11.5, AL West 11.2, AL Central 8.2
  • Foul outs: AL West 36.2, AL East 18.3, NL West 16.0, NL Central 15.2, AL Central 13.8, NL East 6.2

No division in baseball features the extremes of the National League West. They ballparks are five fine places to watch a game, but their layouts and geography do make the division idiosyncratic.

Social Investing

You may be familiar with the concept of social investing. The idea is that when investing in stocks, one should choose companies that meet certain social criteria. Social investing is generally associated with left-of-center causes, but that’s not really accurate. There are liberal social investing funds that avoid firearms, tobacco, and fossil fuel producers and favor companies that offer workers various benefits. But there are also conservative social investing funds that don’t invest in companies involved in alcohol, gambling, pornography, and abortifacients. This isn’t a fringe investing theme: By one estimate, social investing in the US totaled $6.57 trillion at the beginning of 2014, a sum even larger than the payrolls of the Dodgers and Yankees combined.

Here’s the thing about social investing: You’re giving up returns in order to put your money where your conscience is. That’s OK, of course. The entire investing process, if you think about it, is sort of fraught. You’re taking your money and essentially betting on the future performance of a company about which you know very little. Trust me, I spent a career as a financial analyst: I don’t care how many meals you eat at Chipotle, or how many people you know at the Apple Genius Bar, you can’t possibly know as much about the company as a fund analyst who’s on a first-name basis with the CEO. So there’s no sense in making it even harder on yourself by, say, investing in the company guilty of gross negligence and willful misconduct in a major oil spill, if that’d bother you.

Note that I said that with social investing, you’re giving up returns. Some social investing proponents would disagree with me. They claim that by following certain principles that will eventually sway public opinion or markets or regulations, they’re investing in companies that’ll perform better in the long run. That’s a nice thought, but social investing has been around for decades, and we haven’t yet hit that elusive long run. The Domini 400 Index, which was started in 1990, is the oldest social investing index. It started well in the 1990s, but has lagged market averages in the 21st century. Now called the MSCI KLD 400 Social Index, it’s been beaten by the broad market in 10 of the past 14 years. It’s underperfomed over the past year, the past three years, the past five years, and the past ten years, as well as year-to-date in 2015. The differences aren’t huge, but they’re consistent. Maybe for-profit medicine in an aberration, but acting on that meant that you missed the performance of biotechnology stocks last year, when they were up 47.6% compared to an 11.4% increase for the S&P 500. Maybe we need to move toward a carbon-free future, but stocks of energy companies have outperformed the broad market by over 100 percentage points since January 2000. I think that most social investing investors are on board with this tradeoff, but some of the industry proponents have drunk the Kool-Aid of beating the market. That’s just not going to happen consistently. In fact, a fund dedicated to tobacco, alcohol, gambling, and defense (aka “The Four B’s:” butts, booze, bets, and bombs) has outperformed the market as a whole over the past ten years.

OK, fine, but what does this have to do with the National League West? Well, two of its members have, in recent years, made a point of pursuing a certain type of player, just as social investing focuses on a certain type of company. The Diamondbacks, under general manager Kevin Towers and manager Kirk Gibson, became a punchline for grit and dirty uniforms and headhunting. (Not that it always worked all that well.) The Rockies, somewhat less noisily, have pursued players embodying specific values. Co-owner Charlie Monfort (a man not without issues) stated back in 2006,  “I don’t want to offend anyone, but I think character-wise we’re stronger than anyone in baseball. Christians, and what they’ve endured, are some of the strongest people in baseball.” Co-owner Dick Monfort described the team’s “culture of value.” This vision was implemented by co-GMs (hey, Colorado starts with co, right?) Dan O’Dowd and Bill Geivett. (OK, O’Dowd was officially GM and Geivett assistant GM, but the two were effectively co-GMs, with Geivett primarily responsible for the major league team and O’Dowd the farm system).

Now, there’s nothing wrong with players who are also commendable people. You could do a lot worse than start a team with Clayton Kershaw and Andrew McCutchen, to name two admirable stars. Barry Larkin was a character guy. So was Ernie Banks. Brooks Robinson. Walter Johnson. Lou Gehrig. All good guys.

But holding yourself to the standards set by the Diamondbacks and Rockies also means you’re necessarily excluding players who are, well, maybe more characters than character guys.  Miguel Cabrera has proven himself to be a tremendous talent and a somewhat flawed person. Jonathan Papelbon has a 2.67 ERA and the most saves in baseball over the past six years, but he’s done some things that are inadvisable. Carlos Gomez, a fine player, second in the NL in WAR to McCutchen over the past two years, has his detractors. Some of the players whom you’d probably rather not have your daughter date include Babe Ruth, Ty Cobb, Rogers Hornsby, Barry Bonds, and many of the players and coaches of the Bronx Zoo Yankees.

I want to make a distinction here between what the Diamondbacks and Rockies did and the various “ways” that teams have–the Orioles Way, the Cardinals Way, etc. There’s plenty of merit in developing a culture starting in the low minors that imbues the entire organization. That’s not what Arizona and Colorado did. They specified qualities for major leaguers, and, in the case of the Diamondbacks at least, got rid of players who didn’t meet them. I don’t know what’s wrong with Justin Upton, but for some reason, Towers didn’t like something about him, trading him away. The Braves make a big deal about character, but of course they traded for Upton, so the Diamondbacks went way beyond anything the Braves embrace.

In effect, what the Diamondbacks and Rockies have done is like social investing. They’ve viewed guys who don’t have dirty uniforms or aren’t good Christians or something the same way some investors view ExxonMobil or Anheuser-Busch InBev. Again, that’s their prerogative, but it loses sight of the goal. Investors want to maximize their returns, but as I said, most social investors realize that by focusing on only certain types of stocks, they’ll have slightly inferior performance. They’ll give up some performance in order to hew to their precepts. Baseball teams want to maximize wins, and there really isn’t any qualifier related to precepts you can append to that.

The Rockies and Diamondbacks were living under the belief that by focusing on only certain types of players, they could have superior performance. It’s like the people who think they can beat the market averages through social investing. It hasn’t happened yet. And, of course, the Diamondbacks and Rockies were terrible last year, with the worst and second-worst records in baseball. Just as social investing doesn’t maximize profits, the baseball version of social investing didn’t maximize wins in Phoenix or Denver.

I’ve used the past tense throughout this discussion. Towers, Gibson, O’Dowd, and Geivett are gone, replaced by GM Dave Stewart and manager Chip Hale in Arizona and GM Jeff Bridich in Colorado. (The Monforts remain.) Last year, the Diamondbacks created the office of Chief Baseball Officer, naming Tony LaRussa, a Hall of Fame manager who’s been less than perfect as a person and in the types of players he tolerates. These moves don’t change that these are both bad teams. But by pursuing a well-diversified portfolio of players going forward, rather than a pretty severe social investing approach, both clubs, presumably, can move toward generating market returns. Their fans, after all, never signed on to an approach that willingly sacrifices wins for the sake of management’s conscience.


The Atlantazona Bravebacks (Part I: Position Players)

This post was inspired by a question posed by one Pale Hose, in the most recent iteration of the Fangraphs after Dark Chat.

9:08 Comment From Pale Hose
Would you rather be the Braves/Diamondbacks combined roster or the Red Sox? They are roughly equal by depth chart WAR.

Unfortunately, a baseball team cannot pull a Noah and bring two of each position out on any given night. Therefore, in the particular exercise we’ll be using Steamer projections and the depth charts maintained on this very site to explore the position player depth chart of a hypothetical Braves/Diamondbacks combined roster. Let’s call our mashup team the Atlantazona Bravebacks. Because the author of this article is a confirmed leech who is incapable of coming up with original ideas, I’ll be splitting this series into multiple posts.

Note: Yasmany Tomas isn’t currently featured in the aforementioned Arizona Diamondbacks depth chart, so he’ll have to sit this one out.

*No baseball players real or fake were hurt in the creation of this team*

 

C: Christian Bethancourt 1.0 WAR

As face-punch worthy as A.J. Pierzynski is, he’s probably one of the two best catching options because dear-god-the-Arizona-Diamondbacks-catching-situation-is-worse-than-Dusty-Baker’s-two hole-hitters. But it’s okay cause Dave Stewart says so. Christian Bethancourt is actually a catcher unlike Arizona’s apparent long-term option at the position, although there’s still a chance that Mr. O’Brien and his bat sneak onto the roster. We’ll just steal the Braves current depth chart here, with Bethancourt on top, being backed up by the always lovable Pierzynski, giving the Bravebacks an even 1.0 projected WAR out of the catching position.

Christian Bethancourt 0.7 WAR in 448 PA

A.J. Pierzynski 0.3 WAR in 192 PA

 

1B: Paul Goldschmidt 5.5 WAR

Well, both of these teams have first basemen who are both A. relatively young and B. projected to post something resembling or greater than four wins above replacement. Since both teams are National League teams, and two wrongs make a right, we’ll give the Bravebacks a DH. It appears that Steamer believes that Paul Goldschmidt is a +7 1B and Freeman is something resembling a +2 or +3, so, we’ll move Freeman to DH, although he’ll occasionally get some time at first base to rest the typically durable Paul Goldschmidt.

Paul Goldschmidt: 5.3 WAR in 665 PA

Freddie Freeman: 0.2 WAR in 35 PA

 

2B: Aaron Hill 1.3 WAR

Well, I’m sure the braves hope Jose Peraza gets here soon because when your projected starter has a WAR starting with a “-” sign you know it’s gonna be a long year. Fortunately, Aaron Hill is still capable of providing some value, even at his advanced age. Chris Owings features a very promising projection for a player of his age, and something resembling a 50/50 time split between the two should at least prevent second base from being a black hole for the Bravebacks.

Aaron Hill: 0.6 WAR in 385 PA

Chris Owings: 0.7 WAR in 315 PA

 

3B: Jacob Lamb 1.8 WAR

Aaron Hill will split time with Chris Owings at second base, allowing him to log fairly significant time as something of a platoon partner for the left-handed hitting third base prospect, Jacob Lamb. Lamb, like Owings, receives a very encouraging projection for a player of his relatively young age. A Lamb/Hill platoon should be enough to hold the fort down for the Bravebacks. Chris Johnson was employed by both of these teams at one point, but it appears that the Bravebacks have no interest in employing this particular one-tool BABIP beast.

Jacob Lamb 1.4 WAR in 455 PA

Aaron Hill 0.4 WAR in 245 PA

 

SS: Andrelton Simmons 4.2 WAR

Can you say Platinum Glove? Andrelton Simmons wins the team’s shortstop job easily. Simmons is the premier defender in the sport at his position, and isn’t a total black hole offensively. He’s currently projected to see almost all of the team’s plate appearances here, with Chris Owings making a spot start every once in a while to spell Simmons. Although Simmons might not add the offense that a Freeman or Goldschmidt adds, he makes up for that with his defense, establishing himself as one of the premium players on the upstart Bravebacks.

Andrelton Simmons 4.1 WAR in 644 PA

Chris Owings 0.1 WAR in 56 PA

 

LF: Mark Trumbo 1.2 WAR

Mark Trumbo provides Right Handed Power ™ and not much else in left field. Fortunately, he won’t see quite a full slate of plate appearances here, as he’ll spend some time at DH when either Freeman or Goldschmidt needs a breather. David Peralta will slot in behind him, seeing some fairly significant time in left field, providing some much needed defense and athleticism that Trumbo can’t provide.

Mark Trumbo 0.8 WAR in 487 PA

David Peralta 0.4 WAR in 213 PA

 

CF: A.J. Pollock 2.4 WAR

Pollock is one of the better position players on this team, even making MLB Network’s Top 10 Right Now list for center fielders. If he can stay on the field and play a full season, his combination of athleticism and power could make him a very productive player. Pollock might have the most upside out of the 6 starting position players who haven’t already established a high performance base line, as he proved to be quite powerful last season. Given what we know about the Arizona Diamondbacks and Right Handed Power ™ he could be the long term solution in center field for them, and for our Bravebacks.

A.J. Pollock 2.1 WAR in 550 PA

David Peralta 0.3 WAR in 150 PA

 

RF: Nick Markakis 1.0 WAR

Wow that contract was confusing. Well, as long as he’s here he might as well play. Nick Markakis is Nick Markakis. Dependably mediocre. Consistently below average. Reliably meh. Fortunately he has a better group of players assisting him in the outfield with the Bravebacks than he will in real life this season. David Peralta backs him up in the limited time that he is expected to miss, although neck injuries can be tricky. Although to be perfectly honest this team wouldn’t lose anything if Peralta had to take over for an extended period of time.

Nick Markakis 0.9 WAR in 616 PA

David Peralta 0.1 WAR in 84 PA

 

DH: Freddie Freeman 3.4 WAR

Freddie Freeman is a better hugger, and defender than your typical DH, so his WAR takes a bit of a dip moving from 1B to DH. However, he still can provide significant value here, and create a potent left-right tandem in the middle of the Bravebacks batting order. Mark Trumbo sees some time here because any time he’s not spending in the outfield is time well spent. DH figures to be a real strength on this team, something many American League teams wish they could say.

Freddie Freeman 3.1 WAR in 609 PA

Mark Trumbo 0.3 WAR in 91 PA

 

Wow, this roster looks stronger than I thought it would. Although this team is fairly imbalanced, featuring three stars in Simmons, Freeman and Goldschmidt, they’re enough to make up for below average production in the outfield and behind the plate. Chris Owings and David Peralta make for reasonably solid bench contributors, and A.J Pierzynski provides cuddly joy, while also providing what steamer thinks will be reasonable production out of a backup catcher.

 

Now for our projected lineup:

RF (L) Nick Markakis

CF (R) A.J. Pollock

1B (R) Paul Goldschmidt

DH (L) Freddie Freeman

LF (R) Mark Trumbo

3B (L) Jacob Lamb

2B (R) Aaron Hill

SS (R) Andrelton Simmons

C (R) Christian Bethancourt

Nick Markakis provides a solid OBP option at the top of the order, and AJ Pollock has an interesting set of abilities, making him a high-upside play in the number two slot. The two star first basemen form a potent 3-4 combo, and having Freeman in the four-hole splits up Goldschmidt and Mark Trumbo, who isn’t the same quality hitter as the first two but brings plenty of Right Handed Power ™ to the table as a supporting piece. Lamb and Hill can both be solid down-order offensive contributors, and Simmons and Bethancourt are defensive standouts, who certainly haven’t been given starting jobs based on their offensive abilities.

If we add the above WAR totals, we get 21.8 WAR, tying the Giants and Indians for 14th place in Major League Baseball. Seeing that both of these teams should be fairly competitive in 2015, it looks like fans in Atlantazona have good reason to be enthused about the coming campaign. If only fans of the real Braves and Dbacks could say the same.


When Should You Draft Troy Tulowitzki?

In the fantasy baseball world, Troy Tulowitzki is the Lamborghini that is terrific when it’s on the road but spends too much time in the garage. Since becoming a regular in 2007 (eight years), Tulowitzki has had just three seasons in which he played more than 140 games and none of those seasons were in the last three years. He’ll be 30 years old during the 2015 season, so age is not on his side when it comes to health.

Last year was the most tantalizing and ultimately disappointing season of all. Tulowitzki was off to a tremendous start, hitting .340/.432/.603 through 91 games. He was hitting like a vintage Albert Pujols but at the shortstop position. In a little more than half a season, he accumulated 5.1 WAR and had a career-best 171 wRC+.

Then it happened—the yearly injury. On July 19th, in Pittsburgh, Tulowitzki strained his left hip flexor while running to first base and his season was over. Despite playing just 91 games, Tulowitzki ranked 73rd in Zach Sanders’ End of Season Rankings for 2014. Sanders had Tulowitzki worth $16.08, right in the same ballpark as Ryan Braun ($16.33), Jonathan Lucroy ($16.15), and Jimmy Rollins ($16.01). These rankings were based on a 12-team, 5 x 5 league with one catcher, so Tulowitzki’s placement at 73rd would make him the first pick in the 7th round, despite playing just over half the season. Of course, the pre-season consensus rankings of FanGraph writers had Tulowitzki anywhere from 12th to 20th, so 73rd was a big disappointment.

Over the last three years, Tulowitzki has averaged 88 games and 363 plate appearances per season, with a batting line of .316/.399/.551. You know he’s going to play well, you just don’t know how much he’ll play. So what do you do with Tulo on draft day?

First, let’s look at his injury history.

After a 25-game cup of coffee in 2006, Troy Tulowitzki became a Rockies regular in 2007, playing 155 games as a 22-year-old.

In 2008, Tulowitzki hit the disabled list twice. On April 29th, Tulo was not in the original starting lineup but was put in the game at the last minute when Jeff Baker broke a blood vessel in his throwing hand during pregame warm-ups. He then tore his left quadriceps on a defensive play in the first inning. He came off the DL on June 20th and played regularly until July 5th, when he went back on the DL with a cut hand. During the game on the previous day, Tulowitzki hit his bat against the ground in frustration, only to have the bat shatter and cut the palm of his hand up to his index finger. He would need 16 stitches and miss the next two weeks. With the two injuries limiting him to 101 games, Tulowitzki had the least-productive year of his career, other than the first-year 25 game stint.

In 2009, Tulo played 151 games and had 5.5 WAR, one of six seasons with 5 or more WAR in his career

The 2010 season saw Tulowitzki pick up right where he left off in 2009. Through 62 games, he was hitting .306/.375/.502. Then, on June 17th, he was hit by a pitch from Alex Burnett and fractured his left wrist. The injury kept him out for five weeks but he came back better than ever, hitting .323/.386/.634 after the injury. Despite playing in just 122 games, Tulowitzki had his best season in 2010, accumulating 5.9 WAR.

Tulowitzki played 143 games in 2011. Nothing to see here.

In 2012, Tulo was off to a slow start, hitting just .287/360/.486. On May 30th, he strained his groin while running out a ground ball and his season was over. He played just 47 games that year.

Two years ago, Tulowitzki missed nearly a month in the middle of the season with a fractured rib from diving for a ground ball. He still hit .312/.391/.540 and had 5.4 WAR in 126 games.

Last year, as mentioned above, Tulo was off to his best start ever but his season ended in July because of a strained hip flexor sustained while running to first base.

So, over the last seven years, Tulowitzki has been on the DL six times. Twice he was hurt while running to first base. Twice he was hurt while making a play on defense. And twice he was hurt through what I would call flukes—slamming his bat into the ground and getting hit by a pitch. He’s had a torn quadriceps, strained groin, fractured rib, strained hip flexor, cut hand, and fractured wrist. On the one hand, two of those were flukes and he hasn’t hurt the same body part more than once. On the other hand, the quad, groin, and hip flexor are all lower-body type injuries, which could continue to occur as he gets older.

So how much do you factor in the injury history when considering when to draft Troy Tulowitzki in 2015?

On his player page, Tulo is projected for 525 at-bats by Steamer and 467 by the Fans (23 fan projections). He hasn’t reached either of those totals since 2011. I have projections from other sources that are more conservative with his playing time:

Cairo: 352 AB

ZiPS: 381 AB

Marcel: 410 AB

Davenport: 417 AB

CBS: 466 AB

Average them all together and Tulowitzki is projected for 431 at-bats.

So let’s go to the spreadsheet. I created dollar values using the z-scores method for a 12-team, 5 x 5, one-catcher league and came up with the following.

Scenario #1: If Tulowitzki gets the 525 at-bats projected by Steamer, he would be a first-round pick, right there with Jose Abreu and Paul Goldschmidt. (525 AB, 159 H, 85 R, 28 HR, 91 RBI, 3 SB, .302 AVG).

Scenario #2: If Tulowitzki gets 431 at-bats (the average of the seven sources I’ve collected) and his other numbers are pro-rated to that total, he drops down to around the middle of the 4th round. (431 AB, 131 H, 70 R, 23 HR, 75 RBI, 2 SB, .302 AVG).

Scenario #3: If Tulowitzki gets 314 at-bats (his average over the last three years) and his other numbers are pro-rated to that total, he drops to the 18th round. (314 AB, 95 H, 51 R, 17 HR, 54 RBI, 2 SB, .302 AVG).

But that’s not the whole story because I haven’t factored in an injury replacement. When Tulowitzki gets injured, he usually goes all out and heads to the DL and generally misses significant time. He’s not like an aging Chipper Jones who would play 4 or 5 games a week and would be difficult to replace if you can’t make daily moves.

So let’s factor in an injury replacement for Tulo if he misses some time. I took the average of three “replacement-level” shortstops from my spreadsheet (Jordy Mercer, Wilmer Flores, and Yunel Escobar) and pro-rated them to the amount of time Tulowitzki would miss in the latter two scenarios from above.

Scenario #2, Adjusted:

431 AB, 131 H, 70 R, 23 HR, 75 RBI, 2 SB, .302 AVG—Tulowitzki

94 AB, 24 H, 10 R, 2 HR, 10 RBI, 1 SB, .258 AVG—Jordy Flores Escobar

525 AB, 155 H, 80 R, 25 HR, 85 RBI, 3 SB, .295 AVG—Tulo & Friends

Add in 94 at-bats from a replacement-level shortstop to Tulowitzki’s projected stats, which would bring his total to the 525 at-bats projected by Steamer, and Tulowitzki would drop from the middle of the 1st round to the middle of the 2nd round; still definitely worth having on your team.

Scenario #3, Adjusted:

314 AB, 95 H, 51 R, 17 HR, 54 RBI, 2 SB, .302 AVG—Tulowitzki

211 AB, 54 H, 23 R, 5 HR, 23 RBI, 1 SB, .258 AVG—Jordy Flores Escobar

525 AB, 149 H, 74 R, 22 HR, 77 RBI, 3 SB, .284 AVG—Tulo & Friends

Add in 211 at-bats from a replacement-level shortstop to Tulowitzki’s projected stats, which would bring his total to the 525 at-bats projected by Steamer, and Tulowitzki would drop from the middle of the 1st round to the middle of the 4th round. That’s the somewhat realistic downside risk.

Looking at the three scenarios above, we have:

  • A fully-healthy Tulowitzki is a mid-1st round pick.
  • A somewhat healthy Tulowitzki (using the average of 7 projection sources) plus a replacement-level shortstop used for the time missed and Tulo drops to the middle of the 2nd round.
  • A healthy-as-he’s-been-the-last-three-years Tulowitzki (using an average of his at-bats over the last three years) plus a replacement-level shortstop and Tulo drops to the middle of the 4th round.

In the Rotographs’ Top 300, the five participants had Tulowitzki with an average pick of 29th overall, just one spot behind Ian Desmond. In that Top 300, Zach Sanders had Tulo ranked 75th, which was quite the outlier (the others had Tulo from 15th to 28th). If you remove Sanders’ rankings, Tulowitzki would be the 17th player off the board, which would put him right in line with Scenario #2 from above—the middle of the 2nd round.

Everyone has his own appetite for risk, but I would go ahead and roll the dice on Troy Tulowitzki in 2015.