Author Archive

How the Shift has Changed the Game

The shift is one of the most discussed changes in baseball in many years. It is probably the biggest purely defensive change in decades (right?). Commissioner Manfred has publicly stated that he dislikes it. Players are actively working with hitting coaches to beat the shift. People are asking, how can we beat the shift? And some are starting to deny we can. FanGraphs comments predict that the shift will be bad for baseball, because less offense is less fun.

But just how big is the shift? Just how much has it changed the league?


Okay, “Zero” is too strong. It might have changed something, but if it has we can’t tell.

Okay, that too is too strong, but, the number of obvious statistical correlates of an effective shift, seen in terms of league wide stats, is zero. Maybe we can tell, but if so, it can only be told in some serious data-mining that goes beyond obvious results, like number of outs, even in splits, since teams started shifting. No evidence exists of a change in the league-wide stats you would expect the shift to change. BABIP is unchanged. Grounder BABIP is unchanged. Left-handed batter BABIP is unchanged. In fact, BABIP is higher today than it was 40 years ago, but BABIP inflated about .02 from the 1970s to the 1990s and hasn’t evidently changed since.

The shift is a defensive strategy whose intent is to depress run expectancy on balls in play. The likely effect of the shift, if the strategy works, would be in increasing outs on balls in play. Here is a table of BABIP since 1995, the last 20 years:

Year    BABIP
1995   0.298
1996   0.301
1997   0.301
1998   0.300
1999   0.302
2000   0.300
2001   0.296
2002  0.293
2003   0.294
2004   0.297
2005   0.295
2006   0.301
2007   0.303
2008   0.300
2009   0.299
2010   0.297
2011   0.295
2012   0.297
2013   0.297
2014   0.299
2015   0.299

The apparent trend is obvious, if something can be obviously non-existent.

We can look deeper: how have lefties, whom the shift allegedly affects more, been hurt by the shift? Well, in 2015 lefty hitters had their highest BABIP (.301) versus lefty pitchers in the last 13 years (as long as FanGraphs data goes for that split.) Against right-handed pitchers, left-handed batters tied their second-worst season (.299) in the last 15 years, for a whopping one hit in 500 less than the average during that time (.301).

You see, the problem is that we need to look at grounders: fly balls and line drives aren’t really being affected, but grounders are, so in the long run, the shift is slightly depressing hits. Except the obvious correlate isn’t there either.  In 2015, grounders had a .236 BABIP, .004 higher than the 13-year average.

2015 isn’t some sort of outlier. In every easy-to-research split you might choose, BABIP fluctuations in the last 13 years are within the range of random variation. The recent years of the shift era show not even a statistically insignificant decrease in BABIP: in many of those splits, BABIP has by a hair increased. (See tables linked below.)

Another source of evidence that the shift works might be found by comparing defense-independent pitching models with non-defense-independent stats. Maybe BABIP leaves something out, but we see that runs are down relative to DIPS predictions. If so, one possible explanation is the shift. FIP, a great DIPS, is equal to 3*BB+13*HR-2*K + C, where C is a constant that makes league-average FIP equal league-average ERA. If C is smaller now, that suggest (but does not prove) that BIP outs have changed. C is bigger now (by just .0053, or .048 runs per inning), suggesting that more runs are scored from balls in play. It’s no proof, but if balls in play were a lot more frequently outs, we wouldn’t expect them, overall, to account for more runs and ERA would be down more than peripherals imply.

We can’t infer from this data that some individual hitters are unaffected by the shift. Jeff Sullivan’s recent piece on adjusting to the shift is what brought me to the data (I was seeking to investigate just how badly lefty hitters have been hurt, and discovered something far more interesting), and he mentioned Jimmy Rollins’ attempts to adjust to the shift. I recall a lot of speculation about Mark Teixeira being hurt by the shift. Maybe those guys are. Maybe they aren’t. Maybe they aren’t, but others yet to be named are. Things which don’t have league-wide effect may interact with particular skillsets in hard-to-identify ways.

It’s possible that the shift has changed things by reducing the value of range up the middle, allowing more offensively-oriented players to man those positions. But that seems more like an effect that we would see in future, not one we have seen, because it should take years of player development for those sorts of changes to have a league-wide effect.

It is possible that the shift increases strikeouts and depresses walks. It would be hard to know this, though. It is also possible that the shift has reduced the value of certain defensive skills (e.g., range) and that the decreased need for range has allowed teams to play more offensively-oriented guys up the middle, effectively cancelling the BABIP effects. It sounds farfetched to suppose that two of eight hitters being more offensively-minded can cancel an effect of a shift that should apply to eight of eight of them, but we haven’t ruled it out.

Overall, league scoring is down. But DIPS suggest this is mostly the result of more strikeouts, with a little home-run and walk noise thrown in. There are some ways in which the shift might be having an effect — please offer further hypotheses below. All the evidence here is correlational and correlation doesn’t imply causation. Even anti-correlation doesn’t imply non-causation (if people who drink more exercise more — both are correlated positively with wealth — drinking might get anti-correlated with bad health because exercise compensates for the health impact of drinking). But when no correlation is found and no obvious counter-effects can be sighted, the lack of a correlation suggests weak influence at best.


League BABIP, 1975 to 2015

LHB v. LHP and LHB v. RHP, all available years

Ground Ball BABIP, all available years

Heyward, Stanton, and 20 year-old studs

Eno Saris’s recent article on Jason Heyward comps got me thinking about comps. It also happens to coincide with the day that I got my Baseball-reference subscription. That I would start looking at seasons from 20 year-olds was inevitable.

It was maybe the third or fourth thing I noticed: 2010 featured another remarkable season from a 20 year-old hitter: Mike Stanton. Here’s a fun fact about Heyward: among 20 year-olds, only two guys walked in more plate appearances than the Braves’ young stud. (Ted Williams and Mel Ott.) Here’s a fun fact about Stanton: the guy closest to him in batted balls for home runs, among 20 year-olds, is Mel Ott, but Mike Staton sent a greater percentage of batted balls over the fence than any age 20 hitter in the retro-sheet era. (Perhaps less fun: he has the highest K% among 20 year-olds too.)

But who are the players most comparable to Stanton and Heyward? To answer this question, I started focusing on three true outcome rate stats (since those are more stable in small samples than ball-in-play stats) in seasons from 20 year-old hitters (regardless of experience). While it’s tempting to focus on rookies, there are just 102 seasons with 200+ PA from a 20 year-old since 1920, so focusing on similarly young rookies just shrinks an already small group. To expand the group a little, I added 21 year-old in their first season (also cut off at 200 PA).

To compare these players, I developed z-scores for players BB/PA, K/AB, and HR/batted ball (AB-K). (See a technical section below on these scores.) Then, treating each 20 year-olds 3 z-scores as a vector, I found the distance of their vector from Heyward’s and Stanton’s vectors. The smaller this distance from their vector, the more comparable they are.

Read the rest of this entry »

Strasburg’s Debut vs. 29 Other Clubs

What if Stephen Strasburg had debuted against a team that can actually hit? In what should strike anyone as a ridiculous criticism, there have been a few people to point out that Strasburg didn’t exactly face a “real” major league line up in his record breaking major league debut. (A friend of mine joked that he made his 7th AAA start on Tuesday.) Yes–record breaking. I’ll add to Jack Moore’s point: Since 1920, only there have been only 67 games in which a  pitcher has struck out 14 or more batters and walked none. Strasburg’s outing was noteworthy because his is the only one in which that happened with 24 or fewer batters faced. Given that only 66 other pitchers have ever done something like what Strasburg did in his debut, the level of the team he faced seems like a pretty trivial point. This was dominance like we rarely ever see.

Nevertheless, what if Strasburg had faced a “real” line up in his debut? One of the beauties of sabermetrics is that we get to have this argument with math. If, against a real line up, you think he would have looked ordinary and I think he would have looked pretty amazing, we can set aside arbitrary opinions, lay out some points of agreement, and use our calculators to answer the question. Well, that’s a stretch. But at least we can get a sense of the significance that the Pittsburgh lineup made.

Strasburg was obviously on last night. Did he bring his best stuff? Maybe. What we saw last night wasn’t his true talent level. Nobody is that good consistently. Let’s call the talent level Strasburg brought to his debut his instantaneous talent level. That instantaneous talent level faced was in the run scoring environment that the Pittsburg Pirates create. The comination of the two was a .194 wOBA.  Sabermetrics gives us a tool for calculating match-ups known as  a log5 calculation. If we assume that a .298 wOBA is the Buc’s real talent level, we can isolate Strasburg’s instantanous talent level and give the most rational possible answer to the question “what if he’d faced a real line up?”

I’ll cut to the chase and save you some algebra: his instantaneous wOBA-against was .218.

Going back to our log5 calculations, that means the Yankees, if they brought their MLB leading .361 wOBA to face Strasburg last night would have wOBA’d .248. That’s something like the Astros without Lance Berkman or Hunter Pence.

Here’s the same calculation for every other team in the league.

Yankees		0.243
Red Sox		0.240
Reds		0.234
Brewers		0.227
Tigers		0.226
Twins		0.225
Blue Jays	0.223
Rays		0.223
Braves		0.222
D-backs		0.220
Rangers		0.219
Cardinals	0.219
Phillies	0.219
Rockies		0.219
Royals		0.217
Nationals	0.216
Dodgers		0.216
Cubs		0.215
Marlins		0.213
Mets		0.213
Angels		0.212
Giants		0.212
White Sox	0.210
Athletics	0.210
Padres		0.202
Indians		0.202
Orioles		0.196
Mariners	0.195
Pirates		0.194
Astros		0.182

Anyway, we’ll never know what St. Stephen would have looked like against one of 29 other clubs on June 8th, 2010. That’s not the point. The point is that we witnessed one of the great pitching performances in the history of baseball. It was dominance. This post sheds a little light on what dominance means.

A Proposal for Replay in 30 Seconds or Less

You probably already know why I’m writing this post (but if you’re reading this in 2014, Armondo Gallaraga was robbed of a perfect game in the 9th with 2 outs on 6/2/2010.)

Replay gets discussed a lot these days, and there are those in favor of it and those against it. The main reason for it is that replay is more accurate than an umpire, which has been demonstrated over and over again. There are two reasons against it. One is the tradition of umpires, the other is that it will slow down the game. As far as I can tell, it’s futile to argue with people when they love the tradition enough: if you’re sufficiently committed to tradition, no other value will persuade you to give up your stance. I don’t share that love of tradition, but I won’t enter a futile argument here either.

The lost time due to replay is different. We can count seconds and we can try to balance lost time with increased accuracy. Moreover, as technology improves the amount of time lost rewinding tapes and whatever else they had to do in the NFL in the 1980s goes away. So, theoretically, 100% of time used for review is actually spent making a decision. How much time is worthwhile? We’d have to have a discussion about that, but I’m going to throw out 30 seconds. If we could have 30 second replay, it would be worth it. A controversial call on the field typically takes more than thirty seconds anyway, because umpires huddle (but never change the ruling) and managers come out on the field and argue the call (which never has any effect except to get the manager removed from the game.)

Still, it takes a long to time review the play from every angle to come up with the best judgment that the video evidence supports. You just couldn’t do all the work necessary in 30 seconds, so it looks like we’ll have to settle for a longer review time or sacrifice the accuracy we desire.

That is a mistake. The reason replay takes so long is that we think that the goal is to produce the best judgement that the video evidence supports, using time the way we should in a courtroom, where no minute is more valuable than the freedom of the innocent-but-accused. The reviewer must check the play from all angles. He must double check it. He must confer with other reviewers for their opinion, and then come to a consensus. It’s an inefficient system, which in criminal trials is fair and good, but it’s not good for entertainment.

Replay doesn’t have to be a courtroom. Give five reviewers access to all the available video. Give each 30 seconds to decide what the call should have been. Then the vote. They don’t talk about it, they just vote their own best judgment. No changing the vote once cast. The majority rules the day. Suppose for a moment that there is a .75 probability that each of them makes the right call. Then the probability that the majority is correct is .896. (Binomial distribution probability.) Such reviewers would botch the call (as a group) just 1 in 10 times. Furthermore, in the preponderance of replay cases, video evidence is completely clear cut and it takes less than 30 to make a determination that’s right with a probability of 1.

This 30 second replay system would eliminate the vast majority of all erroneous calls in baseball. It wouldn’t be a fail safe system. It would require that we abandon our standard of having evidence that fully justifies our conclusions to all so that no one could come to a better conclusion on the basis of the evidence. But we shouldn’t let perfection be the enemy of the good. And it’s a good thing to preserve perfection.

Brian Matusz: A Curious Case of Control Issues

Brian Matusz entered the 2010 season considered one of the top rookie pitchers in the game. Marc Hulet of Fangraphs recently wrote that he thinks Matusz will win the AL Rookie of the Year. Things haven’t been so rosy for the Baltimore lefty in the last month, however.

Matusz hasn’t pitched excellently in 2010 according to his peripherals. He’s better than his 5.8 ERA, but he’s worse than 3.8 FIP on account of an unsustainable 6.2% HR/FB. His xFIP of 4.6 looks a lot more like the pitcher that he is. He’s striking out a lot of hitters (7.4 per 9), but his fly-ball tendencies are extreme (46.5% over 446 batters faced) and his control looks spotty: 3.3 BB/9 this season.

It’s the last part I want to take a look at, because a (career) 3.1BB/9 is going to wreck a fly-ball pitcher like Matusz unless he can develop swing-and-miss stuff that will get him a strike out per inning. But here’s the thing: Matusz control has been impeccable through his major league career.

54.3% of his pitches have hit the zone since he first took the mound in 2009. For comparison’s sake, just four pitchers managed to throw that percentage of pitches (or more) in the zone last season: Ted Lilly, Cliff Lee, Johan Santana and Roy Oswalt (who tied Matusz’s 54.3%). Just below that are Justin Verlander and Scott Baker. Matusz has allowed 7.6% of hitters a free pass since his arrival. Of those six comparisons, the highest walk percentage was 6.5% (Santana), the lowest 4.4% (Lee) and the average was 5.6%. Somehow Brian Matusz manages to walk 35% more batters than the average of this group of similar pitchers.

It gets even stranger. Matusz throws pitches in the zone, but he’s not exactly a pitch-to-contact guy. His contact rate on pitches in the zone (=83.3%) is about 4% below major league average (~=88%). He does tend to allow contact on pitches out of the zone, but the net result is an average contact rate, and his swinging strike rate of 9.3% is 1% better than the major league average. He’s not throwing pitches in the zone because he gets behind in the count: his 62.0% F-strike rate place him around the top 15% of starters with 60 or more IP last season.

If we adjust his walk rate to reflect what pitchers with a similar rate of pitches in the zone do, Matusz would have just 15 walks this season and a 3.0K/BB ratio. That’s good enough to produce a 4.2 xFIP and 3.6 FIP. ( I calculated the expected walk rate at 6% of batters faced. I calculate expected HR for xFIP as 6.5% of line drives plus flyballs, i.e., balls in the air, rather than as proportion of fly balls to account for inconsistencies in LD and FB scoring.)

While it’s difficult to tell a lot about a pitcher from his plate discipline stats, two things do stand out: swinging strikes get Ks and pitches in the zone prevent walks. The correlation coefficent between BB/BFP and zone% was -.43 for 78 qualified starting pitchers in 2009, which is a pretty strong correlation for baseball.

It would be really cool if I had an explanation of Brian Matusz high walk rate, but it confounds me, and I’m completely open to suggestions. His walks look like an anomaly.