Archive for January, 2017

What to Do With Justin Upton?

Justin Upton is still only 29.

It can be easy to forget about a guy who hasn’t come close to a peak that was over five years ago, but few can maintain the level of excellence that was Justin Upton’s 2011 season.  The hype built from a year like that is huge. The 2005 No. 1 overall pick posting a six-win season at 23 with 31 HR and 21 steals. It’s pretty exciting.

Guys who have enough power to do this are generally pretty talented.

Flash forward to 2016.

Fresh off signing a six-year, $132-million contract, Upton posted a 77 wRC+ along with a .235/.289/.381 slash line in the first half of the season. He struck out in nearly a third of his plate appearances and held a walk rate below his career average.

Most importantly, though, when the Tigers paid Upton big money, they paid him to hit dingers and knock the ball around the yard for extra bases. So someone like him hitting nine homers with a .146 ISO over 350 plate appearances is worrisome.

Yet at the end of the season, Upton ended up with an overall wRC+ of 105, and an ISO of .219.

For qualifying batters, Upton held the crown for the highest second-half ISO increase (.172) while having the fourth-highest ISO of the second half (.318). Meanwhile, he held a second-half wRC+ of 142.

Now, I do understand that he had 86 fewer plate appearances in the second half (356 vs. 270), so it is reasonable to take the ISO and wRC+ increases with a grain of salt. But Upton slugged 22 homers in 270 trips, good for the fourth-most in the second half, behind walking flame Brian Dozier, Khris “I hit dingers through the marine layer” Davis, and Jedd Gyorko (??!!??!).

I don’t know if people expect Upton to start breaking down or expected him to start breaking down but he still continues to crush the ball. For what it’s worth, on average he hits the ball as hard as Paul Goldschmidt (92.3 MPH) and barrels up balls at the same rate as Kris Bryant (7.7% Brls/PA) for an expected ISO of .235 (thanks to Billy Stampfl’s eISO equation). Just for fun, he set his max exit velocity at 114 MPH on his last homer of the year.

Upton is projected by Steamer for a .346 wOBA and 116 wRC+ this season, good for a 2.1 WAR. Given his ability to crush baseballs and his age, I still think Upton has a good chance of surpassing his projections. He’s showed that he has big power, but his first half is weighing his projections down.

The Tigers’ plans going forward are banking on whether or not they can put themselves in a playoff position during the first half before they think about any sort of fire sale. They’re projected to be in the thick of the wild-card race, so they may run a repeat of 2016 and push through to see how close they come. But they can’t continue this way to pay an old core through the next three to four years. In cases like Miguel Cabrera and Victor Martinez, they don’t have much choice but to eat those contracts until they run up.

If things don’t go as planned, the first thing they can do is to ship off J.D. Martinez as a rental and start to rebuild a mostly barren farm system. Martinez is due to become a free agent at the end of the year and there is little chance that the Tigers will offer him the lucrative contract extension that he most likely wants. Ian Kinsler could go next to whomever may need a second baseman and is willing to accept his age. If they really wanted, the Tigers could also see if they could send off Justin Verlander (given that they eat a sizable chunk of his contract).

But the best move for the Tigers could come in the form of trading a resurgent Justin Upton, who can prove that his second-half numbers were no fluke and that the 29-year-old can maintain solid power, as he has throughout his career. It’s tough to find a home for Upton, but the Yankees might be the ones willing to take on his contract, as they’ll be done with CC Sabathia’s monster contract at the end of this year, and Brett Gardner’s contract in 2018. A team like the Yankees might prefer to take Upton and his $22-million AAV rather than test the market for sluggers like J.D. Martinez (also 29) and have to possibly pay more. The Yankees might be willing to take on some of the money and go with a safer outfield bet in Upton rather than having to wait for Aaron Judge or Clint Frazier to become steady contributors. This Yankees team looks like they’re trying to win now, given that they just signed Matt Holliday and Aroldis Chapman, so they might be willing to part with a few prospects at the deadline.

Left field is a weak position right now, and a contender could be looking for a power bat to provide 2-3 wins a year. Should Justin Upton carry his second-half resurgence into 2017, his bat could be too good to pass up, and the Tigers could move his contract and get something going towards a rebuild.


How Often Is the “Best Team” Really the Best?

We know the playoffs are a crapshoot. A 5- or 7-game series tells us very little about which team is actually the better team. But it is easy to forget that the regular season is a crapshoot, too, just with a larger sample size. Teams go into a given game with a certain probability of winning, based on their true-talent levels (i.e., their probability of winning a game against a .500 team). And then, as luck decides, one team wins and the other loses. A season is just the sum total of 162 luck-based games for each team, and there is no guarantee that the luck must even out in the end.

After the regular season, the team with the best record is usually proclaimed “the best team in baseball.” It was the Cubs this year, and the Cardinals the year before, and the Angels the year before that. But were those teams really the best? We can’t tell just by looking at their records. It would be great if we knew the true-talent level of every team. But baseball doesn’t give us probabilities of teams winning; it only gives us outcomes. The same flaw exists for Pythagorean Record, BaseRuns, or any other metric you might use to evaluate a team at season’s end. BaseRuns gets the closest to a team’s true-talent level, because it uses a sample size of thousands of plate appearances, but it’s still an estimate based on outcomes, and not the underlying probabilities of those outcomes.

I wanted to know what the probability is that the team with the most true talent finishes the regular season with the best record in baseball. Since there’s no way to test that empirically, I ran a simulation in R. For each trial of the simulation, every team was assigned a random true-talent level from a normal distribution (see Phil Birnbaum’s blog post for my methodology, although I based my calculations for true-talent variance off of win totals from the two-wild-card era). The teams then played through the 2017 schedule, with each game being simulated using Bill James’ log5 formula. If the team with the most wins matched the team with the most true talent, that trial counted as a success. Trials in which two or more teams tied for the most wins were thrown out altogether.

I ran through one million simulated seasons using this method. In 91.2% of them, a single team finished with the best record in the league. But out of those seasons, the team with the best record matched the team with the most true talent only 43.1% of the time.

So, given that a team finishes with the best record in baseball, there is a 43.1% chance that they are actually the best team. More likely than not, some other team was more talented. Even after 162 games, we can’t really be sure who deserved to come out on top.


G-Beards v. W-Snappers: A New All-Star Event

A lot has been written about the youth movement in professional baseball. A bulge of pre-arb and arb-eligible studs are pushing out the hobbled gritty vets and reworking how the old ballgame is played, structured, and thought about. Aside from bullpen usage, this may be the biggest current trend in baseball, and the defining trait of the post-Moneyball, big-data, and steroid eras. The value and role remaining for baseball’s seniors is a question playing out on the field (Trea Turner) and in contract negotiations (Jose Bautista and Mike Napoli). But what if it was actually played out straight-up man versus child, craft versus skill, knee brace versus jock strap, once a year? A one-game exhibition between under-24s and over-34-year-olds. Graybeards versus Whippersnappers.

While the All-Star Game promises to be more entertaining now that Bud’s “This time it counts!” policy is no más, the majority of FanGraphs fans still likely prefer the weekend’s other event, the Futures Game. It’s a chance to see the abstract names and grades we’ve read about for so long on a real major-league diamond, showing what they can do against similar talent. A youth-versus-veteran competition would offer the same kind of spectacle. A chance to see how well the recently devalued old-timers, with their guile and years of experience, do against the heralded up-and-comers, with their loose swings and swagger. Who wouldn’t want to watch that? It would give players left off the All-Star roster but having respectable seasons a chance for publicity, and help create more generational fraternity among players in a way not based on locker-room hazing. It would be fun — there are not many avenues for inexperienced labor to directly challenge their seniors in any field of work — and I think it would also be surprisingly competitive.

Let’s imagine what this might have looked like last year in San Diego, using WAR leaders (100 PA or 10 innings pitched minimums) through the first half of the season for players 24 and under and 34 and older, not in the All-Star Game. For the actual selections, if this event ever took place, each team could send a player that fits each age bracket, or status quo could be maintained and fans could vote (in which case, I have a feeling Bartolo would start every year he’s not an All-Star).

Graybeards

Starters

(C) David Ross

(1B) Adrian Gonzalez

(2B) Ian Kinsler

(3B) Adrian Beltre

(SS) Jimmy Rollins

(OF) Nelson Cruz

(OF) Ichiro Suzuki

(OF) Curtis Granderson

(DH) Jose Bautista

Reserves

(OF) Rajai Davis

(2B) Chase Utley

(2B) Aaron Hill

(C) Victor Martinez

(OF) Jayson Werth

(OF) Marlon Byrd

(1B) Mike Napoli

(3B) Juan Uribe

(OF) Matt Holliday

(1B) Albert Pujols

(OF) Ryan Raburn

(C) A.J. Ellis

(OF) Coco Crisp

(OF) Nori Aoki

(2B) Brandon Phillips

(C) A.J. Pierzynski

Pitchers

Rich Hill

Adam Wainwright

John Lackey

CC Sabathia

Colby Lewis

Jake Peavy

Hisashi Iwakuma

R.A. Dickey

James Shields

Jonathan Papelbon (worth the price of admission)

Francisco Rodriguez

Brad Ziegler

Joe Blanton

Oliver Perez

Jason Grilli

Koji Uehara

 

Whippersnappers

Starters

(C) Christian Bethancourt

(1B) Miguel Sano

(2B) Jose Ramirez

(3B) Nick Castellanos

(SS) Trevor Story

(OF) Christian Yelich

(OF) Gregory Polanco

(OF) Joc Pederson

(DH) Javier Baez

Reserves

(2B) Jonathan Schoop

(3B) Maikel Franco

(2B) Rougned Odor

(SS) Tim Anderson

(2B) Jurickson Profar

(OF) Nomar Mazara

(OF) Michael Conforto

(OF) Max Kepler

(OF) Mallex Smith

(SS) Eugenio Suarez

(SS) Chris Owings

(OF) Byron Buxton

(1B) Tommy Joseph

(3B) Cheslor Cuthbert

(OF) Delino DeShields

(OF) Jorge Soler

Pitchers

Aaron Nola

Vince Velasquez

Carlos Martinez

Joe Ross

Lance McCullers

Michael Fulmer

Jon Gray

Robbie Ray

Carlos Rodon

Zach Davies

Matt Wisler

Julio Urias

Taijuan Walker

Blake Snell

Roberto Osuna

 

Those are pretty interesting lists of names. All-time greats like Ichiro and Pujols against great career starts like Odor and Story. The AL ROY (Fulmer), many former top prospects, and familiar names make up the pitchers. Almost all the elder statesmen have played in an All-Star Game before, and it’s likely many of the youngsters will get their chances soon. Some 2016 borderline All-Star snubs like Beltre, Kinsler, Yelich, Cruz, and Polanco would have had an opportunity to show what they can do in San Diego. Clearly, the lists show that shortstops don’t last as long and catchers take a while to mature. Youth is heavy on starters (suggesting they will either flame out or be converted to relievers) while age has more relievers sticking around, racking up WAR.

The W-Snappers’ position players edge the G-Beards in total WAR (26.7 to 19.2) and average wRC+ (103.8 to 98.04), but trail in rate stats (7.3 to 9.3 BB% and 23.4 to 18.2 K%) that tend to refine as players age. Counting stats show that under-24s lead in strength and speed (233 to 224 home runs and 96 to 78 stolen bases) despite having nearly 1000 fewer total plate appearances. Age wins in the “old-school” counting stats RBI (850 to 787) and runs (864 to 799). These tallies and plate appearances suggest that teams continue to use their veteran players more often and higher up the lineup than might be prudent. But the game would be a chance to see if savvy situational hitting by aged hitters, in fact, met the eye test. Despite these stats, it would be hard to bet against a lineup with Bautista, Beltre, Kinsler, and Cruz, but I do wonder if that is that only because I’ve been seeing those names for years? Surprisingly, both sides pull the ball and hit for hard contact at nearly the same rates (around 32 and 40% of the time respectively), although the younger players are luckier, with a 29-point higher BABIP (.317 to .288), likely due to their speed.

For pitchers, the two arsenals come in with nearly identical ERAs (3.9 for youth and 3.8 for age), BB/9 (around 3/9), and K/9 (around 9/9), but again the younger players have the edge in total WAR (21.9 to 15.8). Contact (77%), Zone (48%), Swing (46%), and Hard Contact (31%) rates are all uncannily similar across both teams. I suspect the similarities are because the sampling of player quality is roughly equivalent, but I had wondered if there would be more noticeable differences in how pitchers on opposite ends of the age spectrum were getting hitters out (nibbling and generating weak contact, for example).

To make the game more about the players, there would be an additional rule: player-managers for both sides. This would be a chance for managerial hopefuls like David Ross to audition and stir their dogged age-grades against the ravages of time. On the other side, young clubhouse leaders could emerge and rally their cohort against the stubborn establishment. Baseball is about rituals, and what is a more eternal ritual than coming-of-age ceremonies in which fathers initiate young men into adulthood, but not before a challenge of brawn? Imagine the storylines: brush-backs, pick-offs, and Ichiro beating out an infield single would all take on new meanings. Names would be made and stars would fade honorably into fatherly roles who could still show they had it.

Would players go for it? Probably not. They wouldn’t want to label themselves as old, and might see the game as a gimmicky sideshow to the weekend’s main attraction, where everyone would rather be playing. If it were going to work, it would need to be branded in a respectful way: MLB’s Mentorship Game (sponsored by The Boys and Girls Club of America!) between veterans and young guns. The Player’s Union would probably not like older players missing their chance to rest during the break, but it also might be enticing as an opportunity to demonstrate that both vets and youth have a place in the game, and that aging players should receive more contract interest and younger players should have more early-career leverage.

I highly doubt many emerging players would miss a chance to hang out with their elder heroes and show them up during All-Star weekend. So, the question is, what say you, Napoli and party — challenge accepted?


An Attempt to Quantify Quality At-Bats

Several of my childhood baseball coaches believed in the idea of “quality at-bats.” It’s a somewhat subjective statistic that rewards a hitter for doing something beneficial regardless of how obvious it is. This would include actions such as getting on base, as well as less noticeably beneficial things like making an out but forcing the pitcher to throw a lot of pitches. There is some evidence that major league coaches use quality at-bats and, through my experience working for the Florida Gators, I noticed that some college coaches like using it too. However, how it is used varies from coach to coach and it is a stat that is rarely talked about in the online community. Since there doesn’t seem to be a consensus of what a quality at-bat is, I decided to define a quality at-bat as an at-bat that results in at least one of any of the following:

  1. Hit
  2. Walk
  3. Hit by pitch
  4. Reach on error
  5. Sac bunt
  6. Sac fly
  7. Pitcher throws at least six pitches
  8. Batter “barrels” the ball.

There is some room for debate on a few of these parameters (e.g. if six pitches is enough, whether or not sacrifices should be included, etc.). However, in my experience this is roughly in line with what most coaches use, and I think it does a good job of determining whether or not a hitter has a “quality” at-bat. In my analysis I was excited to be able to include the new Statcast statistic, barrels. I have seen coaches subjectively reward a hitter with a quality at-bat for hitting the ball hard, but barrels gives us an exact definition of a well-hit ball based on a combination of exit velocity and launch angle.

The first player I used to test this definition was Billy Hamilton. Hamilton is a player that has always interested me, partially because stealing bases is entertaining, but also because there has always been speculation about whether or not he will ever be able to develop into an average hitter. I also find him interesting because his career has consisted of one awful offensive season sandwiched between two less horrible but still sub-par offensive seasons. His wRC+ in 2014 was 79, in 2015 it was an unsightly 53, and in 2016 it was back up to 78. I thought that his quality at-bat percentages might be able to give us a clue as to whether or not he could become a better hitter. By pulling Baseball Savant data from Bill Petti’s amazing baseballr package, I counted all of Billy Hamilton’s quality at-bats in each of his three MLB seasons. I then divided those quality at-bat totals by his total plate appearances to get his quality at-bat percentages:

2014:  41.75%

2015:  42.28%

2016:  47.52%

It is never ideal to make sweeping conclusions about statistics — especially new ones that are not widely used or understood — without putting them in context. However, at the very least, I think it is a good sign that Billy Hamilton has experienced an upward trend in his quality at-bat percentages. Based on my definition, these results show that he is making more effective use of his at-bats and that he is continuing to develop as a hitter.

To put Hamilton’s scores in some context, I calculated the quality at-bat percentages for several other players and provided them below. I have not had a chance to run every player as of yet, but I think this chart can give you a feel of where Billy Hamilton stands compared to other players. It is also interesting to point out Jason Heyward’s large drop-off in quality at-bat percentage. This is yet another indicator of how poor his 2016 season was. Additionally, and not surprisingly, Joey Votto and Mike Trout have, relatively, very high quality at-bat percentages, while Adeiny Hechavarria (a player who had a wRC+ just north of 50 last season) had a quality at-bat percentage well below that of even Billy Hamilton.

 

                                                      Quality at-bat percentages
Year Billy Hamilton Mike Trout Jason Heyward Joey Votto Adeiny Hechavarria
2014 41.75% 56% 47% 56% 41%
2015 42.28% 55% 48% 56% 42%
2016 47.52% 58% 40% 59% 39%

 

There is more research that needs to be done here in order to make more intelligent conclusions. I would like to run more players through my statistic, including minor leaguers, to see just how well quality at-bats can be used in evaluating talent, development, and predicting future success. I believe that quality at-bats are something that could be relevant in many of the same ways as quality starts. Neither of these statistics inform you of the nuances that make a player great (or not so great), but they do give you an idea of a player’s reliability in having a passable performance. I believe that with further analysis into quality at-bat percentages using the definition I created, we may be able to learn more about how hitters make use of each and every at-bat.


Examining the Tendencies of the Rockies’ Rotation

Don’t you just love how talking about one topic in baseball can bring you to a completely separate topic than the one you were discussing? For instance, my friend and I were discussing possible landing spots for Mark Trumbo (before he decided to head back to Baltimore). One team that came up was the Colorado Rockies and how they shouldn’t have signed Ian Desmond and should’ve gone with Trumbo instead. This led to talking about the Rockies’ rotation and the fact that it wouldn’t matter what sluggers they had if the rotation was — for lack of better words — “trash.” This led me to think what I’m sure many of you are wondering: How is the Rockies’ starting rotation?

Now, we can look at ERA, FIP, and whatever advanced metric you prefer until we’re blue in the face. But what I wanted to focus on is what type of pitchers they bring into Coors Field, mainly in regard to batted-ball statistics. I want to see if the front office prefers to bring in ground-ball pitchers to combat the altitude and ballpark factors of the stadium. I also want to take a look at the pitch mix of their starting five to see if that has a hand in how their rotation is selected.

One would imagine that a pitcher with a good mix of ground balls and fly balls would be preferred in a starting rotation. Too many ground balls and you have a better chance of giving up more hits. Too many fly balls and you risk the opportunity for more home runs. Like the library on FanGraphs says, “If you allow 10 ground balls, you can’t control if zero, three, or nine go for hits, but you did control the fact that none are leaving the park.” Considering a park with the altitude and home-run factor of Coors Field, you would expect a rotation of primarily ground-ball pitchers to lessen the chance of a home run.

Let’s look at Tyler Chatwood and Chad Bettis first. Chatwood and Bettis have very similar stats across the board in addition to being the only two that are above-average ground-ball pitchers. While their HR/FB% are close and below league-average, where they both differ are the home and away splits. While Chatwood seems to get lit up at home, Bettis goes the opposite direction and actually has more fly balls go for home runs when he isn’t starting in Colorado.

Now let’s look at Jorge de la Rosa. Jorge has the worst HR/FB% of any starter on the team, by far. In fact, he was ranked 20th overall in 2016 for HR/FB%. Another stat that Jorge is last in for the starting rotation? Fastball usage, and by a considerable margin. For all MLB starting pitchers with a minimum of 60 IP, he ranks fifth-last in fastball usage in 2016. Maybe this is why the Rockies prefer to stick with fastball-type pitchers. Since 2011, the Rockies have used 21 different starting pitchers. Of those 21, 13 (62%) have been above the league average in fastball usage. In the four years that Jorge has been used as a starter, he’s sat at the bottom of the list three times (he was ranked eighth-last in 2013).

Something else I found noteworthy in the chart is that all five starters have higher fly-ball rates when pitching away as opposed to at home. While the difference for Tyler Anderson is very minuscule (0.2%), the fact that all five fall under this criteria makes it seem more than coincidental. Could they be pitching differently at home than they are when they’re away? Let’s take a historical look.

According to Baseball-Reference, this is the list of the most common Colorado Rockies starting pitchers from 2011 – 2016. The list gives us 30 total pitcher-seasons and 21 unique pitchers. Out of the 30 pitchers listed, 21 (70%) have a lower fly-ball rate at home than they do when pitching away. Additionally, 23 (76%) have a higher ground-ball rate at Coors as opposed to any other stadium. This leads me to believe that Rockies pitchers are conditioned to pitch differently when they are at home versus when they are away. This would make sense, since Coors has the highest park factor in all of baseball and anyone from a fair-weather fan to a front-office executive understands that keeping the ball on the ground in that park is best.

The last question we have to ask is, “Is this change effective?” The short answer is, not really. As seen, 14 out of the 30 (46%) pitchers have a higher HR/FB% when pitching away, while 15 out of the 30 (50%) pitchers have a higher HR/FB% when pitching at home (Eddie Butler in 2015 is the odd man out at an even 0.00%). The good news is that four out of the five latest seasons have the Rockies’ starting rotation having a lower HR/FB% than the league average for starting pitchers. The bad news is that all five seasons were losing seasons.


Happy Trails, Josh Johnson

Josh Johnson could pitch. In this decade, seven players have put up a season in which they threw 180+ innings with a sub-60 ERA-: Clayton Kershaw (three times), Felix Hernandez (twice), Kyle Hendricks and Jon Lester in 2016, Zack Greinke and Jake Arrieta in 2015, and Josh Johnson in 2010. That was the second straight excellent year for Johnson, making the All-Star team in both 2009 and 2010, and finishing fifth in the Cy Young balloting the latter year. Early in 2011 he just kept it going, with a 0.88 ERA through his first few starts. In four of his first five starts that year, he took a no-hitter into the fifth inning. Dusty Baker — a man who has seen quite a few games of baseball in his life and normally isn’t too effusive in his praise of other teams’ players — had this to say at that point:

“That guy has Bob Gibson stuff. He has power and finesse, instead of just power. That’s a nasty combination.”

It seemed like he was going to dominate the NL East for years to come.

Josh Johnson felt pain. His first Tommy John surgery was in 2007, when he was just 23. His elbow had been bothering him for nearly a year before he finally got the surgery. His manager was optimistic at the time:

“I think he’ll be fine once he gets that rehab stuff out of the way,” Gonzalez said. “You see guys who underwent Tommy John surgery, they come back and pitch better.”

But the hits kept coming. His excellent 2010 season was cut short because of shoulder issues (though he didn’t go on the DL) and his promising 2011 season came up short because of shoulder issues. Those same issues had been bothering him all season but he pitched through the pain for two months.

“It took everything I had to go and say something,” he said. “Once I did, it was something lifted off my shoulders. Let’s get it right and get it back to feeling like it did at the beginning of the season.”

“I’m hoping [to return by June 1st],” he said. “You never know with this kind of stuff. You’ve got to get all the inflammation out of there. From there it should be fine.”

That injury cost him the rest of the season.

Josh Johnson loved baseball. Think about something you loved doing, and your reaction if someone told you that you had to undergo painful surgery with a 12-month recovery time in order to continue doing it. Imagine you did that, but then later on, someone told you that you had to do it again if you wanted even an outside chance of performing that activity, but the odds were pretty low. Josh Johnson had three Tommy John surgeries, because they gave him a glimmer of hope of continuing to play baseball.

Josh Johnson had a great career. It’s only natural to look at a career cut short by injuries and ask “what if?” but he accomplished plenty. He struck out Derek Jeter and Ichiro in an All-Star Game, threw the first pitch in Marlins Park, and made over $40 million playing the game he loved. He even lucked his way into hitting three home runs. Now he’s a 33-year-old millionaire in retirement; I think he did all right.


Running Into an Out as a Strategy

I tried to come up with a witty preamble to this but all I could come up with was a lame story about playing RBI Baseball 4 against my older brother. And unless you have mistakenly come to FanGraphs while trying to get to Farmers Almanac (no judgments, Google auto-complete can be weird sometimes) then you probably don’t care about that. So let’s dispense with the amusing introduction and get right to the question. (Or did I just subversively come up with a witty preamble by explaining how I did not have a witty preamble?!)

Scenario:

Runner on first with two out. 0-2 count.

Now anyone who is even slightly familiar with baseball will tell you that this is not a good situation for the offence. Those who are very familiar with baseball to the point that they read things like this post will probably even quote the run expectancy matrix to demonstrate how bad of a situation this is for the offence.

So, yeah, not looking good for the offence. The chance of scoring a run from that base/out state is 0.127. And that is without even accounting for the 0-2 count which obviously makes things worse. MLB as a whole slashed .155/.187/.237 with a 47.6 K% and a 10 wRC+ last year through two-out, 0-2 situations with runners on. In other words, the batter made the third out ~80% of the time. Even Mike Trout, who is Baseball Jesus, strikes out over half the time in 0-2 counts and is running a tOPS+ that is almost single digits. For all intents and purposes, the inning is likely over when it hits that situation.

But the team at the plate is not totally powerless. It can still decide how to end the inning, and they could do it in a way that gives them a more favourable outcome. Which brings me to the crux of this argument;

Why not have the guy on first just take off running?

Before the pitcher even comes set, just take off for second. Worst case, they tag him out and the inning ends (which was the most likely outcome anyway), but now the guy at the plate leads off the next inning in a fresh count, which is obviously a much more favourable scenario for a hitter. And best case, the defence screws up and the runner is now on second. Granted, that is an extreme outcome, and even two out and runner on second is still not a great scoring scenario. But referring back to the run expectancy matrix, it’s ~50% higher than when he was standing on first.

If the outcome of the scenario is almost overwhelmingly going to be an out, then you are not really giving away an out as much as you are just deciding who takes the out. If you have a good hitter at the plate, why have him continue to hit in what is a pretty futile situation, and waste one of his limited PAs, when you can reset the situation and give him what amounts to an extra PA by having the runner take the out instead?

Let’s look at Mike Trout’s career as an example since, well, since it’s fun to look at Mike Trout’s numbers.

No surprise, Mike Trout is a much, much, much better hitter overall than he is in 0-2 counts. Every hitter is. Now let’s also check back in with our friend, the run expectancy matrix.

So right off the bat (no pun intended), we see that the chances of scoring a run at the start of any inning are considerably better than scoring a run with two outs and a runner at first. Add in the fact that you have a very good hitter leading off in Trout and things have seemingly changed significantly for the better, simply by having your base-runner act like an 11-year-old exchange student on the base paths.

If Trout does anything to get on first (single, walk, HBP, dropped third strike, coming to the plate and performing a stand-up routine that is so good the opposing team just awards him first as a thank you, etc etc), now all of a sudden the chances of scoring a run in the inning have gone up to 0.416. Given that Mike Trout got on base nearly 45% of the time last year and is around 40% for his career, it seems like a fairly reasonable outcome. So by having your base-runner deliberately make an out to end the previous inning and saving Trout from doing so, you have gone from a situation where you had a .127 (or lower given the fact that the 0-2 count is not accounted for in the matrix) chance of scoring a run and your best hitter producing an out to a situation where you very likely have a 0.416 chance of scoring a run. And that does not even account for all the other things Trout might do new in this new PA. If he hits a lead-off double, your chances of scoring a run in the inning are now 0.614. If he hits a lead-off home run, your chances of scoring a run are….hold on, where is my calculator? Plus, you have also avoided what was highly likely an out for your best hitter and having to wait two or three innings for him to bat again.

Last year, MLB teams averaged 219 PAs where they had runners on and an 0-2 count. As stated above, in that situation the hitter wound up making the third out ~80% of the time. So that is ~200 innings that could have started with a different guy at the plate and ~200 outs at the plate that could theoretically have been something other than an out. How many innings would have been different by simply giving up the runner for the third out and letting the hitter lead off the next inning in a more favourable count? If you have a good hitter at the plate and he is down 0-2, it might be worthwhile strategy to just tell your base-runner to take off and let your hitter try again the next inning.

Or maybe I have had too much coffee today.


Hierarchical Clustering For Fun and Profit

Player comps! We all love them, and why not. It’s fun to hear how Kevin Maitan swings like a young Miguel Cabrera or how Hunter Pence runs like a rotary telephone thrown into a running clothes dryer. They’re fun and helpful, because if there’s a player we’ve never seen before, it gives us some idea of what they’re like.

When it comes to creating comps, there’s more than just the eye test. Chris Mitchell provides Mahalanobis comps for prospects, and Dave recently did something interesting to make a hydra-comp for Tim Raines. We’re going to proceed with my favorite method of unsupervised learning: hierarchical clustering.

Why hierarchical clustering? Well, for one thing, it just looks really cool:

That right there is a dendrogram showing a clustering of all player-seasons since the year 2000. “Leaf” nodes on the left side of the diagram represent the seasons, and the closer together, the more similar they are. To create such a thing you first need to define “features” — essentially the points of comparison we use when comparing players. For this, I’ve just used basic statistics any casual baseball fan knows: AVG, HR, K, BB, and SB. We could use something more advanced, but I don’t see the point — at least this way the results will be somewhat interpretable to anyone. Plus, these stats — while imperfect — give us the gist of a player’s game: how well they get on base, how well they hit for power, how well they control the strike zone, etc.

Now hierarchical clustering sounds complicated — and it is — but once we’ve made a custom leaderboard here at FanGraphs, we can cluster the data and display it in about 10 lines of Python code.

import pandas as pd
from scipy.cluster.hierarchy import linkage, dendrogram
# Read csv
df = pd.read_csv(r'leaders.csv')
# Keep only relevant columns
data_numeric = df[['AVG','HR','SO','BB','SB']]
# Create the linkage array and dendrogram
w2 = linkage(data_numeric,method='ward')
labels = tuple(df.apply(lambda x: '{0} {1}'.format(x[0], x[1]),axis=1))
d = dendrogram(w2,orientation='right',color_threshold = 300)

Let’s use this to create some player comps, shall we? First let’s dive in and see which player-seasons are most similar to Mike Trout’s 2016:

2016 Mike Trout Comps
Season Name AVG HR SO BB SB
2001 Bobby Abreu .289 31 137 106 36
2003 Bobby Abreu .300 20 126 109 22
2004 Bobby Abreu .301 30 116 127 40
2005 Bobby Abreu .286 24 134 117 31
2006 Bobby Abreu .297 15 138 124 30
2013 Shin-Soo Choo .285 21 133 112 20
2013 Mike Trout .323 27 136 110 33
2016 Mike Trout .315 29 137 116 30

Remember Bobby Abreu? He’s on the Hall of Fame ballot next year, and I’m not even sure he’ll get 5% of the vote. But man, take defense out of the equation, and he was Mike Trout before Mike Trout. The numbers are stunningly similar and a sharp reminder of just how unappreciated a career he had. Also Shin-Soo Choo is here.

So Abreu is on the short list of most underrated players this century, but for my money there is someone even more underrated, and it certainly pops out from this clustering. Take a look at the dendrogram above — do you see that thin gold-colored cluster? In there are some of the greatest offensive performances of the past 20 years. Barry Bonds’s peak is in there, along with Albert Pujols’s best seasons, and some Todd Helton seasons. But let’s see if any of these names jump out at you:

First of all, holy hell, Barry Bonds. Look at how far separated his 2001, 2002 and 2004 seasons are from anyone else’s, including these other great performances. But I digress — if you’re like me, this is the name that caught your eye:

Brian Giles’s Gold Seasons
Season Name AVG HR SO BB SB
2000 Brian Giles .315 35 69 114 6
2001 Brian Giles .309 37 67 90 13
2002 Brian Giles .298 38 74 135 15
2003 Brian Giles .299 20 58 105 4
2005 Brian Giles .301 15 64 119 13
2006 Brian Giles .263 14 60 104 9
2008 Brian Giles .306 12 52 87 2

Brian Giles had seven seasons that, according to this method at least, are among the very best this century. He had an elite combination of power, batting eye, and a little bit of speed that is very rarely seen. Yet he didn’t receive a single Hall of Fame vote, for various reasons (short career, small markets, crowded ballot, PED whispers, etc.) He’s my vote for most underrated player of the 2000s.

This is just one application of hierarchical clustering. I’m sure you can think of many more, and you can easily do it with the code above. Give it a shot if you’re bored one offseason day and looking for something to write about.


Forecasting League-wide Strikeout and Homer Rates

Two of the more notable league-wide trends in MLB today are rising home run and strikeout rates.  Strikeouts have consistently trended upward over the past 35 or so years.  Home-run rate, meanwhile, has moved up and down a bit more, but has also increased during that span overall.

An accurate long-term forecast of trends such as these could be valuable.  As this Beyond the Box Score article illustrates, ideal roster construction changes in tandem with the league-wide run-scoring environment.  During periods where offense is scarce, power hitters see their value go up.  When offense is plentiful, speedy contact hitters become somewhat more valuable.

In the following paragraphs, I will attempt to project strikeout percentage and home-run rate — measured as plate appearances per home run — for the 2017-2026 seasons.  First I will take a univariate approach (i.e., use only past patterns in the data to predict future values). Then, I will try to improve the model by adding in an external regressor variable.

Strikeout Rate

First, here’s a plot of the raw data.

Strikeouts rose fairly steadily from the early 1920s to the late 1960s, dipped for about 10 years, then started to tick back up again around 1980.  They’ve been on the rise ever since, and at an especially accelerated pace since 2005.

I considered several classes of time-series models to represent this data, including Auto-Regressive Integrated Moving Average (ARIMA), exponential smoothing state-space (ets), and artificial neural network.  I used AICc to narrow down the field of models somewhat.  I then split the data into a training set and a test set, fit each remaining model on the training data, and evaluated its forecast accuracy based on mean absolute error and median absolute prediction error using a rolling forecast origin.

The data had to be differenced once to make it approximately stationary, after which there was little to no auto-correlation remaining.  Given this fact, it shouldn’t be too surprising that the best-performing model was a random walk with drift.  Below are forecasts from this model for the next decade, along with 80% and 95% prediction intervals.

Year Forecast Low 80 High 80 Low 95 High 95
2017 21.21 20.49 21.92 20.11 22.3
2018 21.31 20.3 22.33 19.76 22.87
2019 21.42 20.17 22.67 19.51 23.33
2020 21.52 20.07 22.97 19.31 23.74
2021 21.63 20 23.26 19.14 24.12
2022 21.74 19.94 23.53 18.99 24.48
2023 21.84 19.89 23.79 18.86 24.82
2024 21.95 19.86 24.04 18.75 25.15
2025 22.05 19.83 24.28 18.65 25.46
2026 22.16 19.8 24.52 18.55 25.77


The model projects a continued, but decelerated rise in K% relative to what we’ve seen the past decade.

Home Run Rate

I used the same general process to fit a model for the home run data, except I first utilized a Box-Cox transformation to stabilize variance.  This time, there was some auto-correlation that remained after differencing.  The best-performing model turned out to be an ARIMA(0,1,1).

Once again, 80% and 95% prediction intervals are given from that model along with the point forecasts.

Year Forecast Low 80 High 80 Low 95 High 95
2017 34.86 31.87 38.58 30.52 40.95
2018 34.86 31.39 39.37 29.85 42.36
2019 34.86 30.98 40.08 29.30 43.66
2020 34.86 30.63 40.74 28.83 44.91
2021 34.86 30.31 41.36 28.42 46.13
2022 34.86 30.03 41.96 28.04 47.32
2023 34.86 29.77 42.54 27.70 48.50
2024 34.86 29.53 43.10 27.39 49.69
2025 34.86 29.31 43.65 27.11 50.87
2026 34.86 29.10 44.19 26.84 52.06


The projection is flat, but with a decrease in home-run rate from one every 32.90 PA in 2016 to one every 34.86 PA going forward.  If plate appearances remain constant, this would mean a 315 home-run reduction across MLB, or just over 30 per team.

Modeling with Regressors

The difficult part with including regressors in the model is finding ones that are known into the future.  Exit velocity, for example, is something that would probably be quite helpful if you were trying to predict home-run rate.  However, since we don’t actually know what it will be in a given season until after that season is over, it doesn’t do much good for forecasting purposes.

One variable I was able to consider was the percentage of home runs and strikeouts in previous years that came from particularly young or old players.  My theory was that if an unusually high percentage of home runs (or strikeouts) came from players that were nearing the ends of their career, league-wide numbers would be more likely to drop in the coming years (and vice versa if  the sources of strikeouts or power were unusually concentrated among young players).

As it turns out, considering age was not especially useful when I back-tested the strikeout model.  Considering the number of old power hitters was not very useful either.  However, percentage of home runs that came from players under 25 was a significant predictor of home-run rate in future years.

I created a variable called “Youth Index” that averaged percentage of home runs from young players in the previous five seasons, weighted by their correlations to home-run rate in the season in question.  To avoid having to forecast Youth index separately, I actually used a slightly different model for each step in the forecast, each considering only known data.  For example, for the 2017 forecast, data from each of the 2012-2016 seasons is available, but for the 2018 forecast, 2017 data is not.  Thus, the Youth index predictor for 2018 used only data from 2-5 seasons back, the 2019 Youth index predictor used only data from 3-5 seasons back, etc.  I limited the forecast to only five seasons ahead, by which point the model started to converge with the univariate forecast anyway.

Year Forecast Low 80 High 80 Low 95 High 95
2017 36.27 33.15 40.16 31.74 42.65
2018 36.25 32.84 40.61 31.32 43.45
2019 36.03 32.38 40.81 30.77 44.00
2020 35.59 31.71 40.77 30.02 44.31
2021 35.67 31.37 41.62 29.54 45.84

*Note: the red and green lines are 80% and 95% prediction intervals just like on the other graphs.  It only looks different because I created this graph manually rather than using an R-package.

The updated forecast projects a more aggressive rebound in PA/HR (i.e., decrease in home-run rate).  The difference overall in the two forecasts is not huge, but not nothing either.  Interestingly enough, the model is over 90% confident that PA/HR will rise to some degree or another next season.

Ultimately, both home run and strikeout rate are influenced by a wide array of factors, many of which are difficult or even impossible to consider in a long-ish term forecast like this.  The confidence bars aren’t quite as narrow as I’d like, which suggests the observed data may end up deviating quite a bit from these projections.  Nonetheless, I think this is a good starting point.


Searching For Overvalued Pitchers

A little while ago, I created a post here about finding undervalued pitchers by looking at improvements between the first and second halves of the season. I had created a linear regression model for the predictions using data from 2002 to 2015, but when trying to use the same model to find overvalued pitchers, it didn’t exactly work as expected (I use the word “work” loosely here — in all likelihood, my predictions will fail as badly as the new Fantastic Four movie). It did find pitchers who suffered massive setbacks, but the majority of those were primarily due to increased — and probably unsustainable — home-run rates.

For example, Matt Andriese had an extremely successful first half of 2016. He put up a 2.77 ERA in 65 innings, backed up by a 2.85 FIP. But those numbers were much like my ex-girlfriend: pretty on the surface, but uglier once you get to what’s underneath. He struck out a lower percentage of batters than the average pitcher during that time while giving up more hard contact. The biggest sign, though, was his deflated home-run rate. He allowed just 0.28 home runs per nine innings, with only 3.2 percent of his fly balls going over the fence. This righted itself in the second half, where his HR/9 increased to 2.15 and his HR/FB to 17.4 percent. On the other hand, he improved his strikeout and walk rates, actually leading to a drop in his xFIP from 4.04 to 3.92 from the first half of the season to the second.

So then what should we expect from Andriese in 2017? The model I created predicts a 5.56 ERA from Andriese, leaning toward his 6.03 ERA from the second half of last season. While it’s unlikely he will allow fewer than 0.3 home runs per nine innings next year, it’s equally as unlikely that he’ll allow over 2 — after all, no qualified pitcher did so over the course of the 2016 season. Andriese’s full-season FIP of 3.78 actually closely aligned with his xFIP of 3.98, so it’s fair to guess that his home-run rates will level out and his ERA in the coming year will be in that range. That would signify an improvement from his 2016 season, rather than his decline predicted from the model.

So, instead of using the model, I took a simpler approach. Here are the players with at least 50 IP in each half of the 2016 season whose xFIP increased the most from the first half to the second:

xFIP Splits
Name First Half xFIP Second Half xFIP Increase
Tanner Roark 3.64 4.83 1.19
Drew Smyly 4.07 5.10 1.03
Hector Santiago 5.05 5.94 .89
Aaron Sanchez 3.41 4.29 .88
James Shields 4.82 5.70 .88
David Price 3.12 3.98 .86

For the purposes of this article, I’ll ignore Santiago and Shields since it’s unlikely that either of them will be relevant in 2017. That leaves four other pitchers whose skills declined dramatically over the course of the season and who you might want to avoid in your drafts.

Tanner Roark

Believe it or not, Roark’s already 30 years old. He’s actually had pretty decent success in his four years in the majors, with a 3.01 career ERA in over 573 innings. On the flip side, over that same time he has a 3.73 FIP, 3.96 xFIP and 4.06 SIERA. That’s not to say he’s a bad pitcher — just perhaps not as good as his ERA would have you believe. The same can’t be said for his second half of 2016. Despite actually bringing his ERA down from 3.01 to 2.60, his already-inflated FIP and xFIP numbers got even worse. His strikeout rate declined by 2.5 percent while his walk rate rose by about the same amount, leading to just a dismal 1.87 K/BB in the second half. His HR/9 nearly doubled as well, but not due to a substantial increase in his HR/FB rate — rather, his fly-ball rate rose from 26 to 37.6 percent, more in line with his pre-2016 career average of 33.9 percent. Why, then, was he able to continue to be successful? A .230 BABIP and a 86 percent strand rate offer an answer. Don’t expect another sub-3 ERA season from Roark — instead, look more toward his Steamer projection of 4.15.

Drew Smyly

For many last year, Smyly was a popular target. He was a high-strikeout guy who was able to limit walks and generate infield flies, prompting Mike Petriello to write this ringing endorsement for him. In his 114 1/3 innings for Tampa Bay before 2016, Smyly had maintained a 2.52 ERA and was among the best at generating strikeouts. But it all went wrong last year. As Tristan Cockcroft points out, Smyly’s season was marked by a first half of bad luck and a second half of deteriorated skills but better luck. His first-half 5.47 ERA was likely undeserved, as he continued getting strikeouts and limiting walks, but was plagued by a .313 BABIP, 63.2 percent strand rate and a 15.0 HR/FB rate, which corresponded to a 4.45 FIP and 4.07 xFIP. His ERA dropped to 4.08 in the second half, but nearly all of his peripheral stats worsened. A move to Seattle won’t fix all his problems, as Safeco Field was actually more hitter-friendly than Tropicana Field in 2016. The sky is the limit for Smyly, but there’s reason to be cautious. It’s possible he bounces back, but this could be who he is now.

Aaron Sanchez

This guy is good, don’t get me wrong. It took a while for some people to catch on, but I was always on his bandwag…all right, so I was one of the guys who didn’t buy in right away. That’s why I don’t do this for a living. Anyway, seeing his name on this list surprised me. After some digging though, it turns out that in my ignorance, I may have been onto something. In 2015, in Sanchez’s trial run as a starter, he was all right. A 3.55 ERA hid a 5.21 FIP and 4.64 xFIP before he got injured and was subsequently moved to the bullpen. When he returned on July 25, he was a completely different pitcher. This time, while he may not actually have deserved his 2.39 ERA, a 3.10 FIP and 3.33 xFIP showed he had made some kind of improvement. Or had he? After all, he only threw 26 innings in the second half of last season. And while there was undoubtedly a huge improvement for him in strikeout and walk rates, something else caught my attention. Take a look at Sanchez’s batted-ball type percentages from 2015:

Pretty clearly, Sanchez improved his batted-ball profile after becoming a reliever. His 2015 second-half ground-ball percentage of 67.6 percent would be the greatest of all of the 1281 qualified pitcher-seasons since 2002, when the statistic started being tracked. His fly-ball percentage of 18.3 percent, while not as extreme, would still rank as the ninth-lowest since 2002. That begs the question: would he be able to sustain those rates when he moved back to the rotation? The answer, as it always is with historically extreme rates, was no:

Both of his rates came crashing back to historically-accurate norms pretty much right away, and they continued to trend in the wrong direction as the season progressed. This, consequently, caused Sanchez’s xFIP to skyrocket. His strikeout and walk rates got worse from the first half of the 2016 season to the second, but only slightly. What really moved his xFIP was his fly-ball rate, which soared (pun intended — maybe I should do this for a living) from 21 percent to 31.8 percent. It’s difficult to say where Sanchez will go from here — after all, this was his first full season as a starter. If he can keep his fly-ball rate at last year’s 25.1 percent — which ranked fourth-lowest among qualified starters — he could still be a pretty decent starting pitcher, even with regression to a league-average HR/FB rate. What’d be even more impressive, though, is if he could keep his batted-ball rates at his numbers from the first half of 2016, which were among the league’s best. Perhaps with a full season under his belt, Sanchez may now have the stamina and endurance to achieve this feat. If he does, look out. If he doesn’t, you’re looking at an average guy.

David Price

Now that I’ve written nearly an entire article’s worth about one guy, let’s talk about another player from the AL East. Price, for much of his career, has been among the elite at the position. Before last season, the only time he had had an ERA above 3.50 was his first season as a starter back in 2009. Every year of his career, he’s been an above-average strikeout guy, but he topped even his own lofty standards when he struck out 27.1 percent of the batters he faced in the first half of 2016. He was unable to sustain that rate, and in the second half of the season he managed to strike out just 20.3 percent of batters, which would have been his lowest full-season rate since 2009. So what changed? Actually, it might have been the first half that was the fluke. Price allowed a 74.2 percent contact rate in the first half, contrasted with a 79.1 percent rate in the second. Those numbers don’t necessarily mean much on their own, but the difference is easy to spot when looking at his career rates:

Price’s whiff rate was higher than ever in the first half of 2016, but it’s tough to figure out why. Per Brooks Baseball, Price was generating swings and misses on his changeup at a career-best rate in the first half, but I couldn’t find any obvious changes to his velocity or movement on the pitch or any other. It’s fair to wonder, then, if his second-half numbers are what we should expect from Price at this point in his career, since his contact rates during that time were much more sustainable. He probably won’t be as bad as his 2016 3.99 ERA, but I wouldn’t be shocked to see it end up above 3.50 for the second year in a row.

Of course, this is not a comprehensive way to find overvalued pitchers. It’s a crude approach, but one that’s meant to highlight guys who fell off in the second half, as they’re the ones more likely to carry over those declined skills into 2017. That being said, xFIP obviously isn’t perfect, and these players all showed that they were capable of posting above-average results over half a season. Take a risk on them if you want, but be warned that they may not be worth the price.