Archive for Outside the Box

Twin Dynasties – How One Trade Could Have Altered Baseball in the 1980s

In the winter between the 1980 and 1981 baseball seasons, one of the best catchers of all time informed his club, the Cincinnati Reds, that he would no longer catch more than two days each week.

What follows is a speculative rewrite of history. What did happen is that the 1981 Reds played Johnny Bench at first base 38 times, where his fielding percentage was .983 — not bad, but not quite the .995 clip of regular first baseman Dan Driessen. Bench contributed eight home runs, one more than Driessen, and batted over .300, the only time in his career he achieved that mark.

But what if Reds general manager Dick Wagner, the man who dismantled the Big Red Machine, took exception to the demand, and dealt with Bench like he did Tony Perez, Pete Rose, Joe Morgan, and Sparky Anderson?

“If Johnny wants to come to the Phillies, I’ll be happy to find another position.”

The words could have been considered tampering. The speaker could not have cared less.

The speaker was Pete Rose, doing what Pete always did, having fun with the sportswriters. Why not? His Phillies were world champs, and there was no reason to think they couldn’t repeat, just like his Reds teams did in the mid-70s. Back then, he had one of the greatest players at his position alongside him in Johnny Bench, just like he did now in third baseman Mike Schmidt.

The Phillies didn’t really have room for Bench, what with the solid Bob Boone behind the plate, Schmidt at third, Bake McBride in left (with young Lonnie Smith ready to take over), and the newly arrived Gary “Sarge” Matthews in right field. Sarge had averaged over 20 home runs and 70 RBIs across the four years before for the dreadful Atlanta Braves as one of the few bright spots for that woeful franchise.

Pete was about to turn 40, but he felt strong. His knees were still good, and as long as he had those, he felt like he could not only play, but play at the high level to which he’d grown accustomed.

He didn’t really think much of his comment — but when it made it to the papers in Tampa, Reds GM Dick Wagner thought about it. A lot. Read the rest of this entry »


You Wouldn’t Have Noticed If MLB Had Ties in 2018

There are a few articles, including one by Travis Sawchik, arguing that tie games might not be as bad for baseball as you think. The truth is that not only would ties have had no impact on who reached the postseason in 2018, but they would have shaved off four minutes from the average time time.

Using regular expression to parse box score data from RetroSheet, I’ve looked at how the 2018 season would’ve been different without extra innings. Here’s a look at the postseason standings as they were compared to how they would’ve looked with ties (scored 3 points for a W, 1 point for a T, and 0 for a L):

With ties, the 2018 postseason still has the same cast of characters, although the Dodgers and the Rockies would have swapped places in the NL West, causing the Dodgers to go to the Wild Card game.

That’s only looking at 2018. When examining the past five seasons, I found that the postseason implications of tie games would be pretty minimal.

In the plot below, each point represents one team’s season. The X-axis is the number of games that would end in ties and the Y-axis is the number of places a team would’ve moved in their division.

For simplicity, I’m defining postseason implications (PS Implications) as a team missing or making a Division No. 1 or Wild Card No. 1 or No. 2 with the scoring system described above.

Read the rest of this entry »


The Reds May Have Andrew Miller 2.0

Andrew Miller has an undeniably nasty slider. As a Red Sox fan, I remember it far too well from the 2016 postseason. Big Papi’s farewell tour didn’t seem all that fair when you consider the way the Red Sox ran into the buzz-saw that was Miller and the Cleveland Indians. Sure, I’m grateful for Miller helping the 2013 version of the Red Sox win a third world title since 2004, but come on Andrew, you had to ruin Papi’s goodbye?

With Miller’s recent signing with the St. Louis Cardinals, I found myself exploring his FanGraphs page. I stumbled upon this article, Andrew Miller on the Evolution of his Slider, and I instantly began to wonder if pitchers had similar experiences developing their sliders in the 2018 season. The first step in this analysis was to evaluate the evolution of Miller’s slider.

What jumps off the page is the change in velocity. Miller saw a 4.6 mph increase in his slider from 2011 to 2012, then another 3 mph added from 2012 to 2013. This in large part had to do with Miller moving from a starting role to a relief role during his time with the Red Sox. Given that information, however, an increase in velocity that drastic not only shows a pitcher’s willingness to adapt, but also a pitcher’s ability to adapt. By observing Miller’s slider splits, we see that ability to adapt almost immediately.

Read the rest of this entry »


Why We Love Power Pitchers

Heat. Smoke. Velocity. Stuff. Gas. Cheese.

I’m sure there are other words to describe our beloved “fireballers” (see, there’s another one). Pitchers who throw at high speeds are treated like fine china — see Stephen Strasburg in the 2012 postseason. I’m guilty of falling victim to the allure of a 98-mph fastball, regardless of its location. We love it, and, frankly, we’d like to see more of it. Major League Baseball has created a setting in which if a pitcher doesn’t break 90 mph with his fastball, he’s considered a “finesse” pitcher, or even a “soft-tosser” if left-handed. We love strikeouts, especially when a power pitcher blows a fastball by a hitter. But why?

Matt Harvey was stellar in 2015. He’s not so good anymore. Why do teams keep giving him second chances? Mostly because he throws hard.

However, it’s not entirely our fault. After reading Thinking, Fast and Slow by Nobel Prize-winning psychologist Daniel Kahneman, I began to understand why this happens. It shows how you can overcome cognitive bias, but in order to do so, you have to understand which one of your “thought systems” is making that decision for you. He explains that each human being has essentially two modes of thought.

System 1 – fast, instinctive, and emotional (gut feeling)

System 2 – slower, more logical (critical thinking) Read the rest of this entry »


Not Saying Derek Jeter is a Genius, but….

Trading away your team’s best players is never going to make you popular. You’ve probably read plenty about how the return for Marcell Ozuna was pretty good for the Marlins, while the return for Stanton was pretty thin. But savvy baseball fans understand that when you trade players, you’re not only trading their production, but also their contracts – so offloading an insane 13-year $325M contract might not return as much as a team-friendly contract for a lesser player. Add in the fact that Stanton had a no-trade clause (thus, a ton of leverage over to whom he was traded) the fact that the Marlins got anything in return for Stanton is actually impressive. The Yankees took on practically all of Stanton’s remaining contract; so in context, this was a fine deal for the Marlins. Dee Gordon, though contact-and-speed types typically don’t sustain a lot of value into their 30’s (as Gordon enters this year at 30), has put together 3.8 WAR/162 across his last 4 seasons, so maybe they could’ve gotten a little more out of that deal, but again – they were able to get rid of Gordon’s entire contract, which is guaranteed until his age-33 season of 2020.

The trade that stuck out most to me was the one for Christian Yelich. Yelich is an established star in the league who is still very young and has lots of upside, won’t be a free agent until 2023 (accounting for a team-friendly option in 2022), and seems like the type of player you might want to keep, even in a rebuild. They did receive top prospect Lewis Brinson and others in return, but of all the deals they made this one was, to me, the most indicative of “holy crap Jeter has no idea what he’s doing.”

And then, I realized, maybe he’s a genius.

Well, it doesn’t take a genius to recognize that Yelich is a future star, if he isn’t rightfully considered one already. It takes some genius (and perhaps a few gift baskets for your fans?) to say tear it all down. The Marlins could’ve kept any or all of Yelich, Ozuna, and even Stanton, but they’d still have been bad for the foreseeable future. The past four seasons they won 77, 71, 79, and 77 games. It’d have been easy to continue to toil in mediocrity, maybe even make a wildcard or two. But mediocrity is pointless in a business that overtly rewards losing.

You’re saying you want us to lose? No, we’ve BEEN losing. What I want is for us to finish dead last.
-Derek Jeter (probably).

It’s not a secret that tanking is now an actual strategy employed by “rebuilding” teams. I was surprised to learn in my research that tanking is probably not a new phenomenon (the percentage of teams who win 70 or fewer games is fairly consistent over the past several decades) but the game has changed so significantly in the era of free agency, “service time,” and revenue sharing, that the financial benefits of tanking should probably not be legal (but that’s for the CBA to determine). 2018 could be the worst year ever in terms of the number of teams not trying to compete.

Is that wrong? “Tank and bank” isn’t a purely theoretical exercise anymore. As you probably know, the past two World Series winners were responsible for some of the most blatant, disgusting, glorious middle-fingers-to-the-league you could ever imagine – and their paths coincide almost directly.

2008: the Cubs were an aging but solid team that led the NL in wins, with a dangerous lineup and a restored version of Kerry Wood, now a closer. They were bounced early in the playoffs however, in the same year Joe Maddon came up just short of an unlikely World Series title with the Rays. That same year, the Astros were competitive – winning 86 games – but came up short of a playoff birth.

Both teams achieved Marlins-esque mediocrity in 2009 and 2010, and that’s when the tanking rebuilding began. The Astros were the most aggressive and flagrant in their process, and many people forget just how bad they were. They won just 56 games in 2011, followed by campaigns of 55 and 51 wins (that’s three straight seasons of 106+ losses). Their payroll went from $77M in 2011 to $67M in 2012 to $25M in 2013 and then – somehow – cut it in half during the season by shedding even more salary. Notably, and not coincidentally, the Astros got a new owner in 2011. That historically bad 2013 for the Astros was actually historically great: they had the most profitable season in MLB history.

While the Cubs also lost a bunch of games during that same time period, they had a pretty big advantage over the Astros: they hired Theo Epstein (all due respect to Jeff Luhnow, whose roundabout career path is worthy of its own article). I’m not going to try and give Jeter or his staff a current/future grade as it pertains to winning lopsided trades but let’s just assume the Marlins are more like the 2011 Astros than the 2011 Cubs. Their “competitive advantage” over teams who may have better guys in analytics/baseball ops is that they can lose lots of games.

Currently, the Marlins are projected to win the fewest games in baseball which would of course net them the #1 overall pick. Picking first is certainly no guarantee of success (ahem, Kris Bryant went #2 to the Cubs in 2013 while the Astros picked up Mark Appel at #1) but it’s objectively better to pick in the top 2 or 3 than, say, outside of the top 5. There is also the correlated benefit of turning a bigger profit by fielding a lower payroll. To put it simply: if you’re going to miss the playoffs anyway, make as much money as possible while getting the best draft pick you can. It’s easy to say “I wouldn’t have traded Yelich/Ozuna/Stanton” in an attempt to appease your fan base (who aren’t coming to games anyway) while not having personally invested hundreds of millions of dollars into a team; but when your expensive team has little chance of even making the playoffs (never mind winning a World Series) the business side of things becomes even more important.

Based on the aggressive trades the Marlins have made to shed payroll, expect them to mirror the ’11-’13 Astros financially: they have about $80M committed this year, about $50M in 2019, but only $23M in 2020; 22M of that is to Wei-Yin Chen who I’m sure the Marlins hope can stay healthy long enough to generate a little interest from a contender. Righty-specialist and all-time home run preventer Brad Ziegler (making $9M) should have enough appeal to anyone who gets tired of giving up homers to the right-handed heavy Yankees or Angels lineups, and Junichi Tazawa (making $7M) might have a few buyers as well. Justin Bour (age 30, $3.4M, arb-eligible) should find a home with a competitor  – possibly best fit with the aforementioned Angels or even Yankees depending on how Greg Bird recovers, given their respective needs for some left-handed power options. Perhaps they can package the no longer desirable Martin Prado (2yr, $28.5M) with the very desirable J.T. Realmuto (age 27, $2.9M, arb-eligible) to shed some more salary.

By year 5 of their rebuild, both the Cubs and Astros blossomed into legitimate competitors, before winning their World Series in years 6 and 7 respectively (and being in great position to compete for years to come). Marlins fans probably don’t want to year “2022” as the best case scenario for their team to begin competing…but competing for a World Series doesn’t come easy. And as I’m sure Astros and Cubs fans could attest, it’s worth the wait.


Sizing Up the “Most X of the Decade” Races; Plus Bonus Trout Stuff

Admittedly, this is a bit of a stupid topic. These distinctions are often thrown around with an air of importance that is far from earned. Nobody ever mentions that Hank Aaron had the most RBIs in the 1960s. You don’t need to talk about arbitrary endpoints with the Hammer. Mentioning that a player holds one of these “records” is a bit like saying a guy you are trying to set a friend up with has a great personality. It’s likely that this is covering up for something else like a hat covers a balding head.

(*before you head to the comments to blast me, yes, RBI is an incredibly useless statistic)

But they are fun! Hank Aaron did lead in RBI in the 1960s, but he finished second to Harmon Killebrew in home runs. Growing up into a baseball fan in the 1990s, one of the bigger surprises in this genre of record was the hit leader of that decade: Mark Grace. I’d imagine every Cub fan knows this. In addition, anyone who knows a chatty Cubs fan probably knows this, too. Looking at the most recent decade, there are a few surprises: Miguel Tejada had the most games played and at-bats. Andy Pettitte edged out Randy Johnson for most wins. Johnson, along with Alex Rodriguez, dominated most of the categories.

Moving to the point of this article, here is a quick rundown of compelling and not so compelling races to have the most X of this decade, with two seasons remaining:

WAR

You, the Fangraphs reader that goes into the depths of the Community Research section, probably know who leads this category, despite spending 2010 in A ball. But it is a bit closer than you might have thought. While Trout could conceivably win the position player award while sitting out the next two seasons (Joey Votto is the only player that might topple him in this scenario, and he is 9.1 WAR behind), this is the one major statistical category in which position players and pitchers compete with each other.

If I just jogged your memory of that, you probably know the name that is coming: Clayton Kershaw. Trout leads with 54.4 WAR. Kershaw has 52.1 WAR on the pitching side, but has also accrued 1.8 WAR as a batter. That should count. So Kershaw, at 53.9 WAR, is directly on Trout’s heals. Trout is still the heavy favorite, but Kershaw has a puncher’s chance, especially if another injury befalls Trout.

Totally made up odds: Trout, 98%; Kershaw, 2%

Hits

Jose Altuve has put up four straight 200-hit seasons, but he is 251 off the pace and in 15th place. No, this title will most likely go Robinson Cano. Cano leads with 1501, and the only players within sniffing distance are similarly on the downside of their primes. Miguel Cabrera is second at 1416, and then there is a slew of players, including Fangraphs favorite Nick Markakis, in the low to mid 1300s.

Cano should clear 150 hits the next two seasons, and if he does, he will not be passed. Cabrera, as bad as he looked at times last season, would be the likely beneficiary of some unforeseen collapse by Cano. Elvis Andrus is the end of that slew of players behind Cabrera. He has 1329 hits, but recorded 191 last season and is significantly younger than everyone in front of him. I’m going to give him sleeper status for this title.

Totally made up odds: Cano, 93%; Cabrera, 5%; Andrus, 2%

Home Runs

Currently, this is an incredibly close race, with four players within five homers of each other at the top of the list. Jose Bautista is first with 272. Edwin Encarnacion and Nelson Cruz are second and third, and then you get to Giancarlo Stanton in fourth place with 267. Other than Miguel Cabrera and the remnants of Albert Pujols, no one else is close. Stanton has to be the favorite, here, but his status is extremely tenuous. First, let’s just get Buatista out of the way. He’s unemployed and several steps below the other players even if he does try to gut out two more seasons.

Without a doubt, it would be shocking if a healthy Stanton didn’t win this. But a healthy Stanton would be at least a little bit of a shock. The once-oft-injured Cruz and Encarnacion are 37 and 35, respectively, but are still mashing and project for mid-to-high 30-something homers apiece. Cruz has played four straight full seasons and E5 has three straight under his prodigious belt. Stanton is projected by Steamer for a literally—but not really literally—bananas number of 53 home runs. The Fans of Fangraphs are more modest, pegging him with only 48. But Stanton is injury prone. You all know that. There is no argument that he is not. So this is a fairly open race.

Totally made up odds: Stanton, 55%; Encarnacion, 25%; Cruz, 20%

RBI

Again, I do know this is a stupid statistic. But artificial endpoints of decades are pretty stupid, too, so this is fitting. This is Miguel Cabrera’s title to lose, and as long as he plays, he should easily win. And guys making the cash Cabrera is due for the next thousand years generally get every opportunity to play. Sitting behind Cabrera’s 860 RBI are Albert Pujols at 806 and Robinson Cano at 789. The aforementioned Edwin Encarnacion and Nelson Cruz round out the top five with 763 and 756 respectively.

If Cabrera falters, this looks like it would be a wide-open race. Pujols achieved the remarkable 100+ RBI season while losing 2.0 WAR last year. He likely will do much worse, but as long as he is playing, he will continue to accrue a decent number of RBI. E5’s Indians outscored the M’s by 68 runs last year and seem to be a better offensive team, but Cano does have a 26 RBI lead. Honestly, this looks like a virtual toss-up if Cabrera doesn’t win, but the idea of Edwin Encarnacion or Nelson Cruz leading the decade in home runs and RBI is rather delicious.

Totally made up odds: Cabrera, 80%; Cano, 7%; Encarnacion, 6%; Cruz, 4%; Pujols, 3%

Stolen Bases

Be honest, which would you rather be known for: a surprise answer to the question “who stole the most bases in the second decade of the new millennium?” or hitting an epic World Series Game Seven home run… for the losing team. Rajai Davis might say porque no los dos? Davis has the most stolen bases this decade with 301. However, he is actually a longshot to keep this title. Davis just signed a minor league deal with the Indians that includes a non-roster invitation to spring training. He will likely struggle to ever get regular playing time again. He’s 37 years old.

This will likely come down to a race between the two guys behind him. Dee Gordon has 278 stolen bases, had 60 last year, and only turns 30 in April. He has a 35 stolen base lead on 3rd place, which would seem more insurmountable if that person was not arguably a full tick or two faster. Billy Hamilton has 243 stolen bases since coming into the league in 2013, and has been remarkably consistent, stealing one more base each year than the year before. The fans think he’ll do that again this year, hitting 60 stolen bases. Hamilton is over two years younger than Gordon, and might be faster, but the 35 stolen base edge Gordon enjoys makes him the clear favorite.

Totally made up odds: Gordon, 66%; Hamilton 33%; Davis, 1%

Wins

This is likely a two-person race between Max Scherzer and Clayton Kershaw who have 132 and 131 wins respectively. Justin Verlander and Zack Greinke sit enough off the pace at 123 and 122 to make a comeback very improbable absent an injury, but close enough to make a comeback very possible if both players in front of them miss significant time.

Moving to who I give the edge to: there just isn’t a lot separating these two. Scherzer is older, but Kershaw has had a bit more in the way of nagging injuries lately. If it truly were a push going forward, I could just go with Scherzer since he is one ahead at the moment. But I’m going to give Kershaw the ever-so-slight edge because the Dodgers are almost assuredly going to be one of the best teams in baseball the next two years while the Nationals might only have that status for the next season. Verlander gets the nod as more likely spoiler for a similar reason: the Astros are ballers.

Totally made up odds: Kershaw, 45%; Scherzer 43%; Verlander, 7%; Greinke, 5%

Saves

Craig Kimbrel should put this away by midseason. At 291, he ranks 61 ahead of Kenley Jansen and Fernando Rodney, who are tied for second with 230. Aroldis Chapman sits in 5th with 204. Kimbrel’s consistency and consistently light usage should ensure that he continues to rack up saves the next two seasons. Even a repeat of his comparatively modest 66 saves over the last two seasons would give him a realistic lock on this honor.

If Kimbrel does fall apart, the 30-year-old Jansen would be the likely beneficiary, as he has a much stronger hold on his 9th inning role than the 40-year-old Rodney. While Kimbrel might have this decade locked down, he will likely fall short in his quest to surpass Rivera’s total from the last decade. Rivera saved 397 games that decade. It should be noted, however, that Kimbrel barely pitched in the majors in 2010 and recorded only one save. On the other hand, he’s already blown one more save than Mariano did all of last decade.

Totally made up odds: Kimbrel, 97%; Jansen, 2%; Rodney, 1%

Strikeouts

Stop me if you’ve heard this before, but this is a two-man battle between Max Sherzer and Clayton Kershaw. Only this time, there is a much clearer favorite. Scherzer leads Kershaw 1909 to 1835. He essentially built that entire lead last season when he recorded 66 more strikeouts than the limited Kershaw. But Kershaw’s innings shortfall was not the only thing at play here. Scherzer struck out 1.63 more batters per nine innings. For the decade, Scherzer has a 74 strikeout lead in only 14 1/3 more innings. The only realistic path for Kershaw to overtake Scherzer is injury. Of course, with pitchers, injury is always a legitimate and significant risk.

Behind Scherzer and Kershaw is Justin Verlander with 1670. No one past Verlander has any legitimate shot barring a mass retirement of the some of the game’s best starting pitchers. At the end of the day, this is really a question about health. But for Kershaw to overtake Scherzer, he’d not only need Scherzer to get hurt, but he’d have to stay healthy himself.

Totally made up odds: Scherzer, 79%; Kershaw, 19%; Verlander 2%

Runs

This has been a very positive article, but let’s get a bit negative for a second. Which pitcher will give up the most runs in this decade? A big factor in this, of course, is that giving up a lot of runs is bad, and playing bad usually leads to not playing. You have to be good enough to get the ball on a regular basis, but bad enough to rack up the runs allowed. Our frontrunners are honestly not terrible pitchers. Rick Porcello leads the way with 789 runs allowed. James Shields is just behind with 778 runs allowed. Porcello is 29, started a playoff game last year, and is owed a lot of money through the end of the decade. He also won the Cy Young Award in 2016 while accruing 5.1 WAR on the mound. He should have opportunities to add to this total. Porcello underperforms his peripherals, but only by a little. He is basically good enough to never get moved out of the rotation, and durable enough to throw over 1500 innings this decade, but has a suddenly-not-great-for-the-era ERA of 4.29 over the last eight seasons.

James Shields is slated to maybe start opening day 2018. Unlike Porcello, Shields has been dreadful the past two seasons. Yeah, the White Sox starting pitching this year might be awful. Shields is owed a lot of money in 2018, but 2019 is an option that will only be picked up if Shields has a dramatic turnaround. Thus, there is a bit of a catch-22 here. If Shields plays well enough to keep getting the ball regularly into 2019, it seems unlikely that he’ll chase down Porcello. Of course, this could also come down to injury. If either player gets hurt, the other will very likely take this notorious (dis)honor.

Ubaldo Jimenez sits in 3rd with 734 runs allowed. He will thankfully have a hard time adding to that total. If Porcello and Shields find themselves with quick hooks and no jobs, there are a few possible dark horses, including Ervin Santana and Jon Lester, who at the very least should get two full seasons of starts barring injury. For this one, I’m just going to put the field as a third choice rather than trying to single out who might suck, but play.

Totally made up odds: Porcello, 60%; Shields, 30%; Field, 10%

Other Interesting Battles

(My favorites in italics)

Games: Robinson Cano, 1264; Alcides Escobar, 1250 (that would be something)

Runs: Ian Kinsler, 785; Miguel Cabrera, 741; Andrew McCutchen, 740; Robinson Cano, 738

Strikeouts: Chris Davis, 1266; Mark Reynolds, 1250; Justin Upton, 1249

HBP: Shin-Soo Choo, 98; Anthony Rizzo, 98; Chase Utley, 92

Games: Tyler Clippard, 576; Luke Gregerson, 551

Innings Pitched: Justin Verlander, 1705; Max Sherzer, 1670.2; Clayton Kershaw, 1656.1

HBP: Charlie Morton, 82; Justin Masterson, 77 (23 in AAA in 2017)

Balks: Clayton Kershaw, 17; Franklin Morales, 15; Johnny Cueto, 13

Bonus Trout Stuff

You will notice that, apart from WAR, Mike Trout was not mentioned at all in this article. Of course, Trout played zero MLB games in 2010 and only 40 in 2011. But he is also an all-around performer. He doesn’t even show up in the top 10 for most counting categories. So for the lazy, here is where Trout ranks in the decade (if among top 30, must be qualified for rate stats):

Triples: T-8th (40)

Home Runs: T-16th (201)

Runs: 8th (692)

RBI: 30th (569)

Walks: 7th (571)

Intentional Walks: T-14th (61)

HBP: T-27th (55)

Sac Flies: T-16th (40)

Stolen Bases: 17th (165)

Batting Average: 6th (.306)

OBP: 2nd (.410) (not close to first, Joey Votto, .438)

Slugging%: 1st (.566) (biggest threat is probably Giancarlo Stanton, at .554)

Trout is about to play his age 26 and 27 seasons to round out the decade. He’ll be an “old 27” with his August birthday. We don’t know how he’ll age. But it is possible that he plays his whole career, a career of an inner-inner-circle Hall of Famer, and never leads a decade in any traditional counting stat. This on top of his frustratingly low MVP totals. If nothing else does, perhaps that should tell you how stupid this whole exercise is, and how stupid rigid benchmarks for greatness are in general. If Trout were born three years earlier, he could have dominated the counting stat leaderboards of this decade. If he played for a better team, he could have 2-3 more MVP awards.

So what does it all mean? Probably not much. If Albert Pujols squeaks out the most BRI of the decade, will that make it less of a disappointment? Does Nelson Cruz having the most home runs over an arbitrary 10-year period mean that he’ll one day be enshrined in Cooperstown? Well, no. However, I hope you had fun. I know that I did.


Making Baseball Slow Again

If you’re a baseball fan, you may have noticed you’ve been watching on average 10-15 minutes more baseball then you were 10 years ago.  Or maybe you are always switching between games like me and never stop to notice. If you’re not a fan, it’s probably why you don’t watch baseball in the first place: 3+ hour games, with only 18 minutes of real action. You are probably more of a football guy/gal right?  Believe it or not NFL games are even longer, and according to a WSJ study, deliver even less action.

The way the MLB is going, however, it may not be long before it dethrones the NFL as the slowest “Big Four” sport in America (and takes away one of my rebuttals to “baseball is boring”). Currently, the MLB is proposing pitch clocks and has suggested limiting privileges such as mound visits.

Before I get into the specific proposal and the consequences of these changes, let me give you some long winded insight into pace of play in the MLB.

A WSJ study back in 2013 broke down the game into about 4 different time elements:

  1. Action ~ 18 minutes (11%)
  2. Between batters ~ 34 minutes  (20%)
  3. Between innings ~ 43 minutes (25%)
  4. Between pitches ~ 74 minutes  (44%)

The time between pitches or “pace” is what everyone is focused on, and rightly so. It makes up almost twice as much time as any other time element and is almost solely responsible for the 11-12 minute increase in game length since 2008. Don’t jump to the conclusion that this is all the fault of the batter dilly-dallying or the pitcher taking his sweet time. This time also includes mound conferences, waiting for foul balls or balls in the dirt to be collected, shaking off signs and stepping off, etc. Even if we take all of those factors out, there are still two other integral elements that increase the total time between pitches: the total batters faced and the number of pitches per plate appearance (PA).  If either of these increase, the total time between pitches will increase by default. In the graph below, I separated the effects of each by holding the rest constant to 2008 levels to see how each factor would contribute to the total time added.

Any modest game time reduction due to declining total batters faced was made up by a surge in pitches per PA. Increasing pace between pitches makes up the rest.

As we have heard over and over again in the baseball world, the average game time has increased and is evident in the graph above. It’s not just that the number of long outlier games has increased; the median game time has actually crept up by about the same amount.

Plenty of players are at fault for the recent rise in game time. You can check out Travis Sawchik’s post about “Daniel Nava and the Human Rain Delays” or just check out the raw player data at FanGraphs. Rather than list the top violators here, I thought it would be amusing to make a useless mixed model statistic about pace of play.

A mixed model based statistic, like the one I created in this post, helps control for opposing batter/pitcher pace and for common situations that result in more time between pitches. Essentially, for the time between each pitch, we allocate some of the “blame” to the pitcher, batter, and the situation or “context”.

I derive the pace from PITCHf/x data, which contains details about each play and pitch of the regular season. I define pace as the time between any two consecutive pitches to the same batter excluding intervals that include pickoff throws, stolen bases, and other actions documented in PITCHF/x (This is very similar to FanGraphs’ definition, but they calculate pace by averaging over all pitches in the PA, while I calculate by pitch). For more specifics, as always, the code is on GitHub.

It’s a nice idea and all, but does context really matter?

The most obvious example comes from looking at the previous pitch. Foul balls or balls in the dirt trigger the whole routine involved in getting a new ball, which adds even more time. The graph below clearly shows that time lags when pitches aren’t caught by the catcher.

The biggest discrepancy comes with men on base. Even though pickoff attempts and stolen bases are removed from the pace calculation, it still doesn’t account for the game’s pitchers play with runners on base. This includes changing up their timing after coming set or stepping off the rubber to reset.

The remainder of the context I’ve included illustrates how pace slows with pressure and fatigue as players take that extra moment to compose themselves.

As the game approaches the last inning and the score gets closer, time between pitches rises (with the exception of a score differential of 0, since this often occurs in the early innings).

And similarly, as we get closer to the end of a PA from the pitcher’s point of view, pace slows.

Context plays a large part in pace meaning that some players who find themselves in notably slow situations, are not completely at fault. I created the mixed model statistic pace in context, or cPace, which accounts for all of the factors above. cPace can essentially be interpreted as the pace added above the average batter/pitcher, but can’t be compared across positions.

When comparing the correlation of Pace and cPace across years, cPace seems like a better representation of batters’ true tendencies. My guess is that, pitchers’ pace varies more than the average hitter, so many batters’ cPace values benefited from controlling for the pitcher and other context.

After creating cPace, I came up with a fun measure of overall pace: Expected Hours Added Per Season Above Average or xHSAA for short. It’s essentially what it sounds like: how many hours would this player add above average given 600 PA (or Batters Faced) in a season and league average pitches per PA (or BF).

The infamous tortoise, Marwin Gonzalez, leads all batters with over 3 extra hours per season more than the average batter.

That was fun. Now back to reality and MLB’s new rule changes. Here is the latest proposal via Ken Rosenthal:

The MLB tried to implement pace of play rules in 2015, one of which required batters to keep one foot inside the box with some exceptions. The rules seemed to be enforced less and less, but an 18- or 20-second pitch clock is not subjective and will potentially have drastic consequences for a league that averages 24 seconds in-between pitches. Some sources say the clock actually starts when the pitcher gets the ball. Since my pace measure includes the time between the last pitch and the pitcher receiving the ball, the real pace relative to clock rules may be 3-5 seconds faster.

Let’s assume that it’s five seconds to be safe. If a pitcher takes 20 seconds between two pitches, we will assume it’s 15 seconds. To estimate the percentage of pitches that would be affected by these new rules I took out any pitches not caught by the catcher, assuming all the pitches left were returned to the pitcher within the allotted five seconds.

The 18-second clock results in about 14% of the pitches with no runners on in 2017 resulting in violations of the pitch clock. This doesn’t even include potential limits on batters times outside the box or time limits between batters, so we can safely say this is a lower bound. If both of the clocks are implemented in 2020, at least 23% of all pitches would be in violation of the pitch clock(excluding first pitch of PA). Assume it only takes three seconds to return the ball to the pitcher instead of five, and that number jumps to 36%!

And now we are on the precipice of the 2018 season, which could produce the longest average game time in MLB history for the second year in a row as drastic changes loom ahead. I don’t know who decided that 3:05 was too long or that 15 minutes was a good amount of time to give back to the fans. Most likely just enough time for fans to catch the end of a Shark Tank marathon.

Anyways, if game times keep going up, something will eventually have to be done. However, even I, a relatively fast-paced pitcher in college, worry that pitch clocks will add yet another element to countless factors pitchers already think about on the mound.

There are certainly some other innovative ideas out there: Ken Rosenthal suggests the possibility of using headsets for communication between pitchers and catchers, and Victor Mather of the NYT suggests an air horn to bring in new pitchers instead of the manager. Heck, maybe it’ll come down to limiting the number of batting glove adjustments per game. Whatever the league implements will certainly be a jolt to players’ habits and hardcore baseball fans’ intractable traditionalist attitude. The strategy, technology, and physicality of today’s baseball is changing more rapidly than ever. When the rules catch up, I have a feeling we will still like baseball.

 


Who Obtains the Most Assistance in Pitcher Welfare?

Nobody’s perfect, especially umpires. This is the case at any level of the game. Be it softball, tee ball, or baseball, from Little League to the Big Leagues, you will have undeniably disagreed with a call that an ump has made.

Given the movement, velocity, and the newly anointed skill of pitch framing, it’s becoming more difficult for umpires to get the calls right. The robo ump has been discussed quite a bit but I’m not sure how I feel about a machine making decisions in lieu of accepting the concept of human error. We did it for decades before instant replay was instituted.

Umpires get balls and strikes wrong a lot. It’s the way it goes. Given that understanding, I wanted to know which pitcher has in recent years been the beneficiary of favorable calls.

And, like the umpires, not all (strike zone) charts are 100% accurate; leave a little room for error here.

I’ve parsed data on which pitchers have had the most declared strikes that were actually out of the zone. I decided to stop at 2014 because I felt that four years of information was sufficient for the study.

First, the accumulated data.

From 2014 to 2017, the amount of pitchers with phantom strikes has been increasing at fairly high rate; the biggest leap was from 2014 to 2015 (36 pitchers).

chart (4)

Interestingly, the pitchers with at least 100 ‘phantom strike’ calls has actually decreased.

chart (6)

And, despite the jump in total pitchers involved from ’14 to ’15, the pitchers with <=100 strikes called decreased at the highest rate.

Should we go tin foil hat and infer that umps are no longer favoring certain pitchers as much as they used to? Doubtful, but I’m not investigating integrity here.

So who is getting the most benefit from the perceptively visually impaired? First, I took the last four years of pitching data for our parameters. Then, I cut final the list down to a minimum of 10,000 pitches thrown. Lastly, I included only the top 20 pitchers in the group.

20PhantomStrikes

As we can see, Jon Lester of the Chicago Cubs has been the most aided overall; 562 non-strikes in four years.

For the optically minded, here is the pitch chart of Lester’s data.

Jon Lester
That’s A LOT of Trix!

Now, lets see if the percent of pitches has any impact on our leader(s).

20PhantomStrikesPercent

Not a whole lot of variance, at least near the top. Lester clearly wins The MLB Umpires’ “Benefit of the Doubt Award”.

OK, so now we’ve got our man. Case closed, right?

Oh…that little caveat of ‘pitch framing’. Perhaps its that Lester has had great framing from his catchers. Let’s look into that.

For the moment, we are going to focus on Lester and his primary catcher from 2014-2016, David Ross.

dRossLester

Clearly 2014 was Lester’s most favorable year with Ross. That year, Lester ranked third in total pitches called favorably out of the zone (156) and 11th in ratio of calls (4.47).

The subsequent years with Ross are as follows:

2015- 6th (141), 10th (4.43)
2016- 5th (125), 7th (3.95)

Here’s where things get a bit intriguing. Recapping 2017, things appear to fall apart completely for the Cubs in the context of pitch framing.

2017CubsFraming

The only catcher who was able to garner a positive framing rating was Kyle Schwarber, who caught just seven innings that year. But even his stats are far from impressive.

And how did Lester fair in terms of ‘phantom strikes’ that year? He ranked first in overall strikes called out of the zone (150) and fourth in total call ratio to pitches thrown (4.46).

He wasn’t all that far from the top under Ross, but was basically the frontman of the metrics in 2017.

Some things are hopelessly lost in the sphere of the unexplained. But, the research didn’t set out to find reasoning. In this case its more fun to be left with subjective theories. However, it’s a bit silly to think that there is actually an umpire conspiracy allowing Lester to succeed when he apparently shouldn’t.

My best guess is maybe they feel sorry for him since he can’t accurately throw the ball in the infield anywhere other than to the catcher (which did changed a bit in 2017)?

Regardless, Lester is our guy, here; receiving a sizable edge in terms of missed calls. It will be interesting to see if this trend continues this season.


Using Statcast Data to Predict Future Results

Introduction

Using Statcast data, we are able to quantify and analyze baseball in ways that were recently immeasurable and uncertain. In particular, with data points such as Exit Velocity (EV) and Launch Angle (LA) we can determine an offensive player’s true level of production and use this information to predict future performance. By “true level of production,” I am referring to understanding the outcomes a batter should have experienced, based on how he hit the ball throughout the season, rather than the actual outcomes he experienced. As we are now better equipped to understand the roles EV and LA play in the outcome of batted balls, we can use tools like Statcast to better comprehend performance and now have the ability to better predict future results.

Batted Ball Outcomes

Having read several related posts and projection models, particularly Andrew Perpetua’s xStats and Baseball Info Solutions Defense-Independent Batting Statistic (DIBS), I sought to visualize the effect that EV and LA had on batted balls. For those unfamiliar with the Statcast measurements, EV is represented in MPH off the bat, while LA represents the trajectory of the batted ball in Vertical Degrees (°) with 0° being parallel to the ground.

The following graph visualizes how EV and LA together can visually explain batted ball outcomes and allows us to identify pockets and trends among different ball in play (BIP) types.

 

The following two density graphs were created to show the density of batted ball outcomes by EV and LA, without the influence of one another.

As expected, our peaks in density are located where we notice pockets in Graph 1. Whereas home runs tend to peak at 105 MPH and roughly 25°, we see that outs and singles are more evenly distributed throughout and doubles and triples fall somewhere in between, with peaks around 100 MPH and 19°. These graphs served as a substantiation to the understanding that hitting the ball hard and in the air correlates to a higher likelihood of extra-base hits. I found it particularly interesting to see triples resembled doubles more than any other batted-ball outcome in regards to EV and LA densities. Triples are often the byproduct of a variable such as larger outfields, defensive misplays, and batter sprint speed, which are three factors not taken into account during this project.

Expected Results

My original objective in this project was to create a table of expected production for the 2017 season using data from 2017 BIP. Through trial and error, I shifted my focus towards the idea that I could use this methodology to better understand the influence expected stats using EV/LA can have in predicting future results. With the implementation of Statcast in all 30 Major League ballparks beginning in 2015, I gathered data on all BIP from 2015 and 2016 from Baseball Savant’s Statcast search database. In addition, I created customized batting tables on FanGraphs for individual seasons in 2015, 2016, and 2017 for all players with a plate appearance (PA).

After cleaning the abundance of Statcast data that I had downloaded, I assigned values of 0 and 1 to all BIP, representing No Hit or Hit respectively, and values of 1, 2, 3, and 4 for Single/ Double/Triple/Home Run respectively. Comparing hits and total bases to their FanGraphs statistics for all individuals, I made sure all BIP were accounted for and their real-life counting statistics matched. Following this, I created a table of EV and LA buckets of 3 MPH and 3°, along with bat side (L/R), and landing location of the batted ball (Pull, Middle, Opposite), using Bill Petti’s horizontal spray angle equation. While projection tools often take into account age, park factors, and other variables, my intention was to find the impact of my four data points and to tell how much information this newly quantifiable batted-ball data can give us.

By calculating Batting Average (BA) and Slugging Percentage (SLG) for every bucket, we can more accurately represent a player’s true production by substituting in these averages for the actual outcomes of similar batted balls. For instance, a ball hit the opposite way by a RHB in 2015 and 2016 between 102 and 105 MPH and 21° and 24° was worth .878 BA and a 2.624 SLG, representing the values I will substitute for any batted ball hit in this bucket.

While a player’s skills may be unchanged, opportunity in one season can be tremendously different from the following, affecting individual counting statistics. With a wide range of factors that can lead to changes in playing time, from injuries to trades to position battles, rate statistics are steadier when looking at year-to-year correlation than counting statistics. Typically rate statistics, such as BA and SLG, will correlate better because they remove themselves from the variability and uncertainty of playing time, which counting statistics are predicated heavily on. Totaling the BA and SLG for each individual batter’s BIP from the 2015 and 2016 season, I was able to then divide by their respective at-bats for that year to determine their expected BA (xBA) and SLG (xSLG).

Year-to-Year Correlation Rates For BA/SLG/xBA/xSLG to Next Season BA/SLG, 2015 to 2016 / 2016 to 2017

Season (Min. 200 AB Per Season)

Statistic

2015 to 2016

2016 to 2017

BA

0.140

0.173

xBA

0.163

0.179

SLG

0.244

0.167

xSLG

0.301

0.204

While our correlation rates for xBA and xSLG are not terribly strong from season to season over their BA and SLG counterparts, we are seeing some positive steps towards predicting future performance. The thing that stands out here is the decline in SLG and xSLG from 2015/2016 to 2016/2017 and my suspicions are that batters are beginning to use Statcast data. It is widely known that a “fly-ball revolution” has been taking place and many players are embracing this by changing their swings and trying to elevate and drive the ball more than ever. With a new record in MLB home runs in 2017, I would not be surprised to see our correlation rates jump back up next season as the trend has now been identified and our batted-ball data should reflect that.

By turning singles, doubles, triples, and home runs into rate statistics per BIP, we are able to put aside the playing time variables and apply these rates to actual opportunities. Similar to calculating xBA and xSLG, I created a matrix of expected BIP rates (xBIP%) for each possible BIP outcome (x1B%, x2B%, x3B%, xHR%, xOut%). In other words, for each bucket of EV/LA/Stand/Location, I calculated the percentage of all batted-ball outcomes that occurred in that bucket (i.e. 99-102 MPH/18-21°/RHB/Middle: x1B% = 0.012, x2B% = 0.373, x3B% = 0.069, xHR% = .007, xOut% = .536), and summed the outcomes for each batter, giving their expected batting line for that season.

Using this information, I wanted to find the actual and expected rates per BIP for each possible outcome (actual = 1B/BIP, expected = x1B/BIP, etc.) and apply these to the next seasons BIP totals. For example, by taking the 2B/BIP and x2B/BIP for 2015 and multiplying by 2016BIP, I can find the correlation rates for actual and expected results, with disregard to opportunity and playing time in either season. Below are the correlations from 2015 to 2016 and 2016 to 2017, with both their actual and expected rates applied to the BIP from the following season.

Correlation Rates For Actual and Expected Batted Ball Outcomes, 2015 to 2016 /

2016 to 2017

Season (200 BIP Per Season)

Statistic

2015 to 2016

2016 to 2017

1B

0.851

0.843

x1B

0.871

0.865

2B

0.559

0.594

x2B

0.624

0.644

3B

0.173

0.262

x3B

0.107

0.098

HR

0.628

0.608

xHR

0.662

0.617

Looking at the above table, the expected statistics have a higher correlation to the following seasons production than a player’s actual stats. The lone area where actual stats prevail in our year-to-year correlations is projecting triples, which should come as no surprise. Two noticeable areas that this study neglects to take into account are park factors and batter sprint speed. Triples, more than any other batted-ball outcome, rely on these two factors, as expansive power alleys and elite speed can influence doubles becoming triples very easily.

One interesting area where this projection tool flourishes is x2B/BIP to home runs in the following season. By taking the x2B/BIP and multiplying by the following seasons’ BIP and then running a correlation to the home runs in that second season, we see a tremendous jump from the actual rate in season one to the expected rate in season one.

Correlation Rates of 2B/x2B To HR In Following Season, 2015 to 2016 / 2016 to 2017

Season (200 BIP Per Season)

Statistic

2015 to 2016

2016 to 2017

2B -> HR

0.381

0.322

x2B -> HR

0.535

0.420

Conclusion

With this information, we can continue to understand the underlying skills and more accurately determine expected future offensive production. By continuing to add variables to tools like this, including age, speed, park factors, as many projection models have done, we can incrementally gain a better understanding to the question at hand. This research attempted to show the effect EV/LA/Stand/Location have on batted balls and how that data can help us find tendencies, underlying skills, and namely, competitive advantages.

Having strong correlation rates on xBIP% to the next season’s actual results, it is exciting to find another area of baseball that gives the information and ability to better understand players and their abilities. With the use of Statcast, we are looking to create a better comprehension of what has happened and how can we use that to know what will happen, and it appears that we have.


Do Fielders Commit More Errors Playing Out of Position in a Shift?

The shift has taken the MLB by storm in recent years.  Broadcasters love to criticize the shift, despite its numerous advantages.  One potential problem that the shift may cause is an increase in fielding errors.  This may be a direct result of fielders playing out of their normal position.  Using the shift data provided to FanGraphs courtesy of Baseball Info Solutions, as well as batted ball data courtesy of Baseball Savant, I ran a logistic regression to find the likelihood of a batted ball resulting in a fielding error.

The approach I used to find the probability of a batted ball being a fielding error was to run a logistic regression.  The variables included in the regression were release speed, hitter-pitcher matchup (dummy variable with a value of 1 if the pitcher and hitter were both righties or lefties), runners on base dummy, launch speed (exit velocity), effective speed, launch angle, and dummy variables for both traditional and non-traditional shifts.  The model only included batted balls that were hit in the infield, as the majority of shifts occur in the infield.

 

Screen Shot 2017-12-23 at 2.01.19 AM

Above are the results of the logistic regression used to determine the probability of a batted ball being an error.  The dependent variable is whether or not the error occurred.  Two results that logically make sense are Exit Velocity (Launch Speed) having a positive coefficient and Launch Angle having a negative coefficient.  Both of these variables are significant on the 1% level.  Exit Velocity having a positive coefficient shows that the harder the ball is hit, the harder the ball is to field.  Launch Angle has a negative coefficient, meaning that the lower the angle (meaning a ground ball over a fly ball) the more likely the fielder is to commit an error.  Both of these results are logical, and are consistent with research that has been conducted in the past. The most interesting results from the model are both traditional and non-traditional shifts leading to an increased likelihood of an error occurring.  Both variables were statistically significant on the 5% level, and prove that players struggle more in the field when playing out of their normal position.

While teams are unlikely to change their shifting patterns (more good comes out of the shift than bad), they must take into account which fielders are worse when playing out of position.

Despite the increased probability of an error occurring, I still believe that the positives out weigh the negatives when it comes to shifting.  In future research, it would be interesting to look at this data on a minor league level, as well as seeing if fielders who shifted more in the minors are more prepared to field out of position in the majors.